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OVEREXPREPSION OF MAMMAT.TAN > M n v trai, ppo^^ 
Field of i-h? TnYf » n1 -i ffn 
The invention concerns genes and methods for 
5 expressing eukaryotic and viral proteins at high levels 
in eukaryotic cells. 

Background of InvenH ?n 
Expression of eukaryotic gene products in 
prokaryotes is sometimes limited by the presence of 
10 codons that are infrequently used in E. coli. Expression 
of such genes can be enhanced by systematic substitution 
of the endogenous codons with codons overrepresented in 
highly expressed prokaryotic genes (Robinson et al. 
1984). it is commonly supposed that rare codons cause 
15 pausing of the ribosome, which leads to a failure to 
complete the nascent polypeptide chain and a uncoupling 
of transcription and translation. The mRNA 3' end of the 
stalled ribosome is exposed to cellular ribonucleases , 
which decreases the stability of the transcript. 
20 Summary of th* T nY »ntWn 

The invention features a synthetic gene encoding a 
protein normally expressed in mammalian cells wherein at 
least one non-preferred or less preferred codon in the 
natural gene encoding the mammalian protein has been 
25 replaced by a preferred codon encoding the same amino 
acid. 

Preferred codons are: Ala (gee) ; Arg (cgc) ; Asn 
(aac); Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (ggc) ; His 
(cac) ; lie (ate) ; Leu (ctg) ; Lys (aag) ; Pro (ccc) ; Phe 
30 (ttc) ; Ser (age) ; Thr (ace) ; Tyr (tac) ; and Val (gtg) . 
Less preferred codons are: Gly (ggg) ; He (att) ; Leu 
(etc) ; ser (tec) ; Val (gtc) . All codons which do not fit 
the desrripf i nr ^ r^o ^o^-^^ 3 ; 
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By protein normally expressed in mammalian cells 
is meant a protein which is expressed in mammalian under 
natural conditions. The term includes genes in the 
mammalian genome such as Factor VIII, Factor IX, 
5 interleukins , and other proteins. The term also includes 
genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes 
which are encoded by a virus (including a retrovirus) 
which are expressed in mammalian cells post-infection 

10 In preferred embodiments, the synthetic gene is 

capable of expressing said mammalian protein at a level 
which is at least 110%, 150%, 200%, 500%, 1,000%, or 
10,000% of that expressed by said natural gene in an Xn 
vitro mammalian cell culture system under identical 

15 conditions (i.e., same cell type, same culture 
conditions, same expression vector). 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding 
natural gene are described below. Other suitable 

20 expression systems employing mammalian cells are well 
known to those skilled in the art and are described in, 
for example, the standard molecular biology reference 
works noted below. Vectors suitable for expressing the 
synthetic and natural genes are described below and in 

2 5 the standard reference works described below. By 

"expression" is meant protein expression. Expression can 
be measured using an antibody specific for the protein of 
interest. Such antibodies and measurement techniques are 
well known to those skilled in the art. By "natural 

3 0 gene" is meant the gene sequence which naturally encodes 

the protein. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in the 
natural gene are non-preferred codons. 
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retrovj.l "^T^^^ ^ " ' 

P«*.i» is . P lentivirai ^'"f""- «. 
preferred embodiment the III " V,n m ° re 

• — Purred -^S^"^,^ ^ «- 
9P120, or gp 160 . In other prefaced \L! P ° 1 ' 
Protein i. a huMn pro t.i„. " ter " d the 

The invention also features . ».»h_^ - 
a synthetic gen. encoding a protein norlan P " Mri "° 
10 kalian cells. The »Ld ^.r^tL"*""" - 
Preferred and less-p r .f erred cod J ^ ^ 
encoding the p r ot.i„ and replacing. ™. « 

non- pr .f erred and less-preferred ^.'li^l 7 T* 

codon rather than a preferred codon 

It is not necessary to reDlaee » 1 1 i 
20 or non-preferred codons with Jl, ' Purred 

25 comprising the synthetic g.„.. ' 

By -vector" is aeant a ONA aolecuie, derived 

or us ; :::: 0 r:ui fra5nenta ° f dna >• — 

30 restriction si! °V C ° nti,in <™ « -or. unique 
replLati " ~ X *" """"^ ° f ""onoaou, 

that th" °" ^ ' de " ned h °" « v.hide organise such 
that the clon.d segu.nc. is r.producibi. . Thus , by 

_«pr essio vector „ is Mant mm - Y 
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DNA expression vectors include mammalian plasmids and 
viruses. 

The invention also features synthetic gene 
fragments which encode a desired portion of the protein. 
5 Such synthetic gene fragments are similar to the 

synthetic genes of the invention except that they encode 
only a portion of the protein. Such gene fragments 
preferably encode at least 50, 100, 150, or 500 
contiguous amino acids of the protein. 
10 In constructing the synthetic genes of the 

invention it may be desirable to avoid CpG sequences as 
these sequences may cause gene silencing. 

The codon bias present in the HIV gpl20 envelope 
gene is also present in the gag and pol proteins. Thus, 
15 replacement of a portion of the non-preferred and less 
preferred codons found in these genes with preferred 
codons should produce a gene capable of higher level 
expression. A large fraction of the codons in the human 
genes encoding Factor VIII and Factor IX are non- 
20 preferred codons or less preferred codons. Replacement 
of a portion of these codons with preferred codons should 
yield genes capable of higher level expression in 
mammalian cell culture. Conversely, it may be desirable 
to replace preferred codons in a naturally occurring gene 
2 5 with less-preferred codons as a means of lowering 
expression. 

Standard reference works describing the general 
principles of recombinant DNA technology include Watson, 
J.D. et al., Molecular Biology of the Gene , Volumes I and 

30 II, the Benjamin/ Cummings Publishing Company, Inc., 

publisher, Menlo Park, CA (1987); Darnell, J.E. et al., 
Molecular Cell Biology . Scientific American Books, Inc., 
Publisher, New York, N.Y. (1986); Old, R.W., et al. , 
Principles of Gene M anipulation; An Introduction to 

35 Genetic Engineering . 2d edition, University of California 


NSDOCID: <WO 9609378A1_I_> 


WO 96/09378 


PCT/US95/11511 


15 


- 5 - 

Press, publisher, Berkeley, CA (1981); Maniatis, T . et 

M^lnr nonlng? ft Memory N ^ nv . LL , 2nd Ed . 

Cold spring Harbor Laboratory, publisher, Cold Spring 

Harbor, NY (1989); and Curr.n, p. , „ 

5 fiic^gy., Ausubel et al., Wiley Press, New York, NY 

(1989). 

Details p f « ?rr1rt1nn 
Description of i- nr p rn ., 1n 7 - 
Figure 1 depicts the sequence of the synthetic 
10 gpl20 (SEQ ID NO: 34, and a synthetic gpieo (SEQ ID NO- 
35) gene in which codons have been replaced by those 
found in highly expressed human genes. 

Figure 2 is a schematic drawing of the synthetic 
9P120 (HIV-i MN) gene. The shaded portions marked vi to 
v5 indicate hypervariable regions. The filled box 
indicates the CD 4 binding site. A limited number of the 
unique restriction sites ares shown: H (Hind3), Nh 
(Nhel), p (Pstl), Na (Nael) , M (Mlul) , R (EcoRl) a 
(Agel) and No (Notl) . The chemically synthesized DNA 
fragments which served as PGR templates are shown below 
the g P i20 sequence, along with the locations of the 
primers used for their amplification. 

Figure 3 is a photograph of the results of 
transient transfection assays used to measure gpi 2 0 
25 expression. Gel electrophoresis of immunoprecipitated 
supernatants of 293T cells transfected with plasmids 
expressing gpi20 encoded by the IIIB isolate of HIV-l 
(g P 120IIlb), by the MN isolate (gpl20mn) , by the MN 
isolate modified by substitution of the endogenous leader 
peptide with that of the CD5 antigen (gpl20mnCD5L) , or by 
the chemically synthesized gene encoding the MN variant 
with the human CDSLeader (syngpi20mn) . Supernatants were 
harvested following a 12 hour labeling period 60 hour. 


20 


30 
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Figure 4 is a graph depicting the results of ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 
cells transfected with plasmids expressing gp!20 encoded 
5 by the IIIB isolate of HIV-1 (gpl20 Illb) , by the MN 
isolate (gpl20mn) , by the MN isolate modified by 
substitution of the endogenous leader peptide with that 
of CDS antigen (gp!20mn CD5L) , or by the chemically 
synthesized gene encoding the MN variant with human CDS 
10 leader (syngpl2 0mn) were harvested after 4 days and 
tested in a gpl20/CD4 ELISA. The level of gpl20 is 
expressed in ng/ml. 

Figure 5, panel A is a photograph of a gel 
illustrating the results of a immunoprecipitation assay 
15 used to measure expression of the native and synthetic 

gpl20 in the presence of rev in trans and the RRE in cis. 
In this experiment 293T cells were transiently 
transfected by calcium phosphate coprecipitation of 10 <ug 
of plasmid expressing: (A) the synthetic gpl20MN sequence 
20 and RRE in cis, (B) the gp!20 portion of HIV-1 IIIB, (C) 
the gpl20 portion of HIV-1 IIIB and RRE in cis, all in 
the presence or absence of rev expression. The RRE 
constructs gpl20IIIbRRE and syngpl20mnRRE were generated 
using an Eagl/Hpal RRE fragment cloned by PCR from a 
25 HIV-1 HXB2 proviral clone. Each gpl20 expression plasmid 
was cotransf ected with 10 of either pCMVrev or CDM7 
plasmid DNA. Supernatants were harvested 60 hours post 
transf ection, immunoprecipitated with CD4:IgG fusion 
protein and protein A agarose, and run on a 7% reducing 
30 SDS-PAGE. The gel exposure time was extended to allow the 
induction of gpl2 0IIIbrre by rev to be demonstrated. 
Figure 5, panel B is a shorter exposure of a similar 
experiment in which syngpl20mnrre was cotransf ected with 
or without pCMVrev. Figure 5, panel C is a schematic 
3 5 diagram of the constructs used in panel A. 
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figure 6 is a comparison of «-h. 
wildtyp. rat ^ gfine ^^"^ of the 

synthetic rat THY-i oeB , * N ° : 37) and « 

~ *^r. ^l^t Ho: 

pr.cur.or w hich dir . et thf a !t!\ « th. 

10 inositol glycan ancho^ t'" Ph ° Ph " i ^- 

for ass.»bl y of th. THV-i " ! sites used 

- - s^thetic^^n^L:!:-!:!.- — 

c^etrylai^ '^^T^ - «~ 
transiently transfected with .L 15 
f*«k line) , ratTHY-i wiJ "Hdtyp. rat THY-i 

or vector only C^d" ~ * ^ 

20 transfected with th. h~ W6re 

«*«*-! aonocionai »^ ^ 
"Tc- conjugated anti-*ouse z g o ant^T ' P ° lyCl ° nal 

transfection. ' antibody 3 days after 

S up. r „atants of ^ Z,^^?* 1 ™ ™»» 
either syngpi20inn transfected with 

« " 9 ion of th . s yn9 p 12 „ TO 9 ;: n v;r 

syngpi20nin.rTHY-lenv construct w*e „ 

gene was cloned * r>*-^ 
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Notl site of the syngpi20mn plasmid and tested for 
correct orientation. Supernatants of 35S labelled cells 
were harvested 72 hours post transf ection, precipitated 
with CD4:IgG fusion protein and protein A agarose, and 
5 run on a 7% reducing SDS-PAGE. Figure 9, panel B is a 
schematic diagram of the constructs used in the 
experiment depicted in panel A of this figure. 

Description of the Preferred Embodiments 

Construction of a Synthetic crpi2 Q Gene Having Codons 

10 Found in Highly Expressed Hu man Gen^ 

A codon frequency table for the envelope precursor 
of the LAV subtype of HIV-l was generated using software 
developed by the University of Wisconsin Genetics 
Computer Group, The results of that tabulation are 

15 contrasted in Table 1 with the pattern of codon usage by 
a collection of highly expressed human genes. For any 
amino acid encoded by degenerate codons, the most favored 
codon of the highly expressed genes is different from the 
most favored codon of the HIV envelope precursor. 

2 0 Moreover a simple rule describes the pattern of favored 

envelope codons wherever it applies: preferred codons 
maximize the number of 

adenine residues in the viral RNA. In all cases but one 
this means that the codon in which the third position is 
25 A is the most frequently used. In the special case of 
serine, three codons equally contribute one A residue to 
the mRNA; together these three comprise 85% of the codons 
actually used in envelope transcripts. A particularly 
striking example of the A bias is found in the codon 

3 0 choice for arginine, in which the AGA triplet comprises 

88% of all codons. In addition to the preponderance of A 
residues, a marked preference is seen for uridine among 
degenerate codons whose third residue must be a 
pyrimidine. Finally, the inconsistencies among the less 
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frequently used variants can be accounted for by the 
observation that the dinucleotide CpG is 

underrepresented; thus the third position i s le ss likely 
to be G whenever the second position is c. as in the 
5 codons for alanine, proline, serine and threonine; and 
the CGX triplets for arginine are hardly used at all 
TABLE l: ^Frequency in the HIV-i m b env gene 

and m highly expressed human genes. 

High Env . _ 

10 Ala High Env 

SIM. 


15 


20 


25 


GC 

C 

53 

27 


T 

17 

18 


A 

13 

50 


G 

17 

5 

Acs 




CG 

C 

37 

0 


T 

7 

4 


A 

6 

0 


G 

21 

0 

AG 

A 

10 

88 


G 

18 

8 

Asn 




AA 

C 

78 

30 


T 

22 

70 

Asp 




GA 

C 

75 

33 


T 

25 

67 


30 


3 5 CT 


TT TT 
40 

AA 


c 

26 

10 

T 

5 

7 

A 

3 

17 

G 

58 

17 

A 

2 

30 

G 

6 

20 





TG 

\~ 

68 

16 


T 

32 

84 

1 

\J j. JJ 




LA 

A 

12 

55 


G 

88 

45 






A 

25 

67 


s* 

75 

33 

Gly 




GG 

C 

50 

6 


T 

12 

13 


A 

14 

53 


G 

24 

28 





CA 

C 

79 

25 


T 

21 

75 

11* 




AT 

C 

77 

25 


T 

18 

31 


A 

5 

44 





TC 

C 

28 

8 


T 

13 

8 


A 

5 

22 


G 

9 

0 

AG 

C 

34 

22 


T 

10 

41 


Tin 
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cc 

C 

48 

27 


T 

19 

14 


A 

16 

55 


G 

17 

5 





TT 

C 

80 

26 


T 

20 

74 


10 


— 1 A _ 


TA 

C 

74 

8 


T 

26 

92 

▼Al 




GT 

C 

25 

12 


T 

7 

9 


A 

5 

62 


G 

64 

18 


Codon frequency was calculated using the GCG program 

established the the University of Wisconsin Genetics 
Computer Group. Numbers represent the percentage of 
15 cases in which the particular codon is used, Codon usage 
frequencies of envelope genes of other HIV-l virus 
isolates are comparable and show a similar bias. 


In order to produce a gpl2 0 gene capable of high 

20 level expression in mammalian cells, a synthetic gene 
encoding the gpl2 0 segment of HIV-l was constructed 
(syngpl2 0mn) , based on the sequence of the most common 
North American subtype, HIV-l MN (Shaw et al. 1984; Gallo 
et al. 1986) . In this synthetic gp!20 gene nearly all of 

25 the native codons have been systematically replaced with 
codons most frequently used in highly expressed human 
genes (FIG. 1) . This synthetic gene was assembled from 
chemically synthesized oligonucleotides of 150 to 200 
bases in length. If oligonucleotides exceeding 120 to 

3 0 150 bases are chemically synthesized, the percentage of 
full-length product can be low, and the vast excess of 
material consists of shorter oligonucleotides. Since 
these shorter fragments inhibit cloning and PCR 
procedures, it can be very difficult to use 

3 5 oligonucleotides exceeding a certain length. In order to 
use crude synthesis material without prior purification, 
single-stranded oligonucleotide pools were PCR amplified 
before cloning. PCR products were purified in agarose 
gels and used as templates in the next PCR step. Two 
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adjacent fragments could be co-amplified because of 
overlapping sequences at the end of either fragment 
These fragments, which were between 350 and 400 bp in 
size, were subcloned into a pCDM7-derived plasmid 
5 containing the leader sequence of the CDS surface 
molecule followed by a Nhel/Pstl/Mlul/EcoRi/BamHi 
polylinXer. Each of the restriction enzymes in this 
polylinxer represents a site that is present at either 
the 5' or 3' end of the PCR-generated fragments. Thus, 
10 by sequential subcloning of each of the 4 long fragments, 
the whole g P i20 gene was assembled. For each fragment 3 
to 6 different clones were subcloned and sequenced prior 
to assembly. A schematic drawing of the method used to 
construct the synthetic gpi 2 o is shown in FIG. 2. The 
15 sequence of the synthetic gpi 2 0 gene (and a synthetic 
gpl60 gene created using the same approach) is presented 
in FIG • l. 

The mutation rate was considerable. The most 
commonly found mutations were short (i nucleotide) and 
20 long (up to 30 nucleotides) deletions. m some cases it 
was necessary to exchange parts with either synthetic 
adapters or pieces from other subclones without mutation 
in that particular region. Some deviations from strict 
adherence to optimized codon usage were made to 
accommodate the introduction of restriction sites into 
the resulting gene to facilitate the replacement of 
various segments (FIG. 2). These unique restriction sites 
were introduced into the gene at approximately loo bp 
intervals. The native HIV leader sequence was exchanged 
with the highly efficient leader peptide of the human CDS 
antigen to facilitate secretion. The plasmid used for 
construction is a derivative of the mammalian expression 
vector P CDM7 transcribing the inserted gene under the 
control of a strona hnraan ^n#v ■ 


a 

25 
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To compare the wild-type and synthetic gp!20 
coding sequences, the synthetic gp!2 0 coding sequence was 
inserted into a mammalian expression vector and tested in 
transient transfection assays. Several different native 
5 gpl20 genes were used as controls to exclude variations 
in expression levels between different virus isolates and 
artifacts induced by distinct leader sequences. The 
gpl20 HIV Illb construct used as control was generated by 
PCR using a Sall/Xhol HIV-1 HXB2 envelope fragment as 

10 template. To exclude PCR induced mutations a Kpnl/Earl 
fragment containing approximately 1.2 kb of the gene was 
exchanged with the respective sequence from the proviral 
clone. The wildtype gpl2 0mn constructs used as controls 
were cloned by PCR from HIV-1 MN infected C8166 cells 

15 (AIDS Repository, Rockville, MD) and expressed gpl20 

either with a native envelope or a CDS leader sequence. 
Since proviral clones were not available in^this case, 
two clones of each construct were tested to avoid PCR 
artifacts. To determine the amount of secreted gpl20 

20 semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate 
coprecipitation were immunoprecipitated with soluble 
CD4 : immunoglobulin fusion protein and protein A 
sepharose. 

25 The results of this analysis (FIG. 3) show that 

the synthetic gene product is expressed at a very high 
level compared to that of the native gpl20 controls. The 
molecular weight of the synthetic gp!2 0 gene was 
comparable to control proteins (FIG. 3) and appeared to 

30 be in the range of 100 to 110 kd. The slightly faster 

migration can be explained by the fact that in some tumor 
cell lines like 293T glycosylation is either not complete 
or altered to some extent. 

To compare expression more accurately gpl20 

35 protein levels were quantitated using a gp!20 ELISA with 
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rthr t ^ T % defflobiiized phase - This anai ^ is sh °- (Fie 

4) that ELISA data were comparable to the 
anununoprecipitation data, with a gp 12 o concentration of 

less than the background cutoff (5 ng/»i, for all the 
native gpl20 genes. Thus, expression of the synthetic 
g P 120 gene appears to be at least one order of magnitude 
higher than wildtype gpi 2 0 genes. m the experiment 
shown the increase was at least 25 fold. 
10 The R 0l e of rgv in lp]? n Eypr<tce . ^ 

Since rev appears to exert its effect at several 
steps an the expression of a viral transcript, the 

possible role of non-tMnei.fu.., 

- w^wnux enecis in the 

^proved expression of the synthetic gpi 2 o gene was 
15 tested. First, to rule out the possibility that negative 
signals elements conferring either increased mRNA 
degradation or nucleic retention were eliminated by 
changing the nucleotide sequence, cytoplasmic mRNA levels 

20 171 tCSted * Cyt ° PlaSBic ™ A Prepared by NP40 lysis 

20 of transiently transfected 293T cells and subsequent 

elimination of the nuclei by centrif ugation. Cytoplasmic 

W3S subse ^ntly prepared from lysates by multiple 
Phenol extractions and precipitation, spotted on 
nitrocellulose using a slot blot apparatus, and finally 
25 hybridized with an envelope-specific probe. 

Briefly, cytoplasmic mRNA 293 cells transfected 
with COM4, gpi20 IHB, or syng P i 2 0 was isolated 36 hours 
post transfection. Cytoplasmic RNA of Hela cells 
infected with wildtype vaccinia virus or recombinant 
0 virus expressing g P i20 IHb or the synthetic gpi20 gene 
was under the control of the 7 . 5 promoter was isolated 16 
hours post infection. Equal amounts were spotted on 
nitrocellulose using a slot blot device and hybridized 
with randomly labelled 1.5 kb apl 20TTTb ai 
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were quantitated by scanning the hybridized membranes 
with a phospoimager . The procedures used are described 
in greater detail below. 

This experiment demonstrated that there was no 
5 significant difference in the mRNA levels of cells 
transfected with either the native or synthetic gpl20 
gene. In fact, in some experiments cytoplasmic mRNA 
level of the synthetic gpl2 0 gene was even lower than 
that of the native gpl2 0 gene. 

10 These data were confirmed by measuring expression 

from recombinant vaccinia viruses- Human 293 cells or 
Hela cells were infected with vaccinia virus expressing 
wildtype gp!20 Illb or syngp!20mn at a multiplicity of 
infection of at least 10. Supernatants were harvested 24 

15 hours post infection and immunoprecipitated with 

CD4 : immunoglobin fusion protein and protein A sepharose. 
The procedures used in this experiment are described in 
greater detail below. 

This experiment showed that the increased 

2 0 expression of the synthetic gene was still observed when 
the endogenous gene product and the synthetic gene 
product were expressed from vaccinia virus recombinants 
under the control of the strong mixed early and late 7.5k 
promoter. Because vaccinia virus mRNAs are transcribed 

25 and translated in the cytoplasm, increased expression of 
the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This 
experiment was repeated in two additional human cell 
types, the kidney cancer cell line 293 and HeLa cells. 

30 As with transfected 293T cells, mRNA levels were similar 
in 293 cells infected with either recombinant vaccinia 
virus . 
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Codon Vsmq? in Lent- j v j r ,j ff 

Because it appears that codon usage has a 
significant impact on expression in mammalian cells, the 
codon frequency in the envelope genes of other 
retroviruses was examined. This study found no clear 
pattern of codon preference between retroviruses in 
general. However, if viruses from the lent i virus genus 
to which HIV-l belongs to, were analyzed separately, 
codon usage bias almost identical to that of HIV-i was 
found, a codon frequency table from the envelope 
glycoproteins of a variety of (predominantly type C) 

retroviruses excluding the lentivim«= „ 

- — _ (/icjjbibu, ana 

compared a codon frequency table created from the 
envelope sequences of four lentiviruses not closely 
related to HIV-i (caprine arthritis encephalitis virus, 
equine infectious anemia virus, feline immunodeficiency 
virus, and visna virus) (Table 2). The codon usage 
pattern for lentiviruses is strikingly similar to that of 
HIV-l, in all cases but one, the preferred codon for 
2 0 HIV-l is the same as the preferred codon for the other 

lentiviruses. The exception is proline, which is encoded 
by CCT in 41% of non-HIV lentiviral envelope residues, 
and by CCA in 40* of residues, a situation which clearly 
also reflects a significant preference for the triplet 
25 ending in A. The pattern of codon usage by the non- 
lentiviral envelope proteins does not show a similar 
predominance of A residues, and is also not as skewed 
toward third position C and G residues as is the codon 
usage for the highly expressed human genes. In general 
30 non-lentiviral retroviruses appear to exploit the 

different codons more equally, a pattern they share with 
less highly expressed human genes. 


WO 96/09378 


PCT/US95/11511 


- 16 - 

TABLE 2: Codon frequency in the envelope gene of 

lentiviruses (lenti) and non-lentiviral 
retroviruses (other) . 
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10 which a particular codon IS JsS Percentage in 

lentiviral retroviruses was coS^i-h ?° n Usage of no "~ 
precursor sequences of bovine ll^lt*™* the ^elope 
leukemia virus, human T-celi i ? la Vlrus *«lin« 
T-c.li lymphotropic CJuS type £ iS.^ type hunan 
15 forming isolate of murine leEL^ t lnk Cel1 f °cus- 
Rauscher spleen foeuS-S^JJ • la Vlrus WW) , the 

the 4 070A SmphStJopIl fSoTatl iS^' th ? 10A1 is °^te, 
leukemia virus ien,,\. ".^f and the °veloprolif er*r i 

simian sarcoma vlru^im?™^ 0 ™ f at leu ***ia viru s ;~"~ 
leukemogenic retrlvtrus^?^ J^d i?^ eaia virus < 
virus. The codon frequency tablS 4 ?S" 3pe leu ^emia 
SIV Antiviruses were^SpileS rUm ?L thC non " HIV ' "on- 
precursor sequences for the envelope 

virus, equine Sflrtiw. ISS?! a ^hritis encephalitis 

immunodeficiency v"U a a S ^ Vlr V s ' feline 

virus, and visna virus. 

In addition to the preval.nc. of A containing 

c=d=„s. i. n tiv iral codons adhere to th . h,; 0 ;^ 1 ;;;^ 

strong CpG und.rr.presentation. so that th. third 

t p ;rp for aiani - praii - 

sho rarely 6 . The retroviral envelope triplet, 

o L B S1 :: lir ' ■ -d.rrepresenL lon 

and^!„ D °" ° bVlOUS di£fe ""« ""ween Antiviruses 

» i !„\ r h r roviruses with respect <° cpc P— . 
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-on, th. retroviral envelop, coding sequences, tut is 

present amon ° the 

40 fv ^^ V" P'P»ndence p. Wpn H v ., , 

To examine whether r*»T- ■> a«- >• ~- v 
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expression of both native and synthetic gene was 
investigated. Since regulation by rev requires the rev- 
binding site RRE in cis, constructs were made in which 
this binding site was cloned into the 3' untranslated 
5 region of both the native and the synthetic gene. These 
plasmids were co-transf ected with rev or a control 
plasmid in trans into 293T cells, and gpl20 expression 
levels in supernatants were measured semiguantitatively 
by immunoprecipitation. The procedures used in this 

10 experiment are described in greater detail below. 

As shown in FIG. 5, panels A and B, rev 
upregulates the native gpl20 gene, but has no effect on 
the expression of the synthetic gpl20 gene. Thus, the 
action of rev is not apparent on a substrate which lacks 

15 the coding sequence of endogenous viral envelope 
sequences. 

TiY VTt >**i on of a synthetic rat THY-1 gene with HIV 
envelope codons 

The above-described experiment suggest that in 
20 fact "envelope sequences" have to be present for rev 
regulation. In order to test this hypothesis, a 
synthetic version of the gene encoding the small, 
typically highly expressed cell surface protein, rat 
THY-1 antigen, was prepared. The synthetic version of 
25 the rat THY-1 gene was designed to have a codon usage 

like that of HIV gpl20. In designing this synthetic gene 
AUUUA sequences, which are associated with mRNA 
instability, were avoided. In addition, two restriction 
sites were introduced to simplify manipulation of the 
30 resulting gene (FIG. 6). This synthetic gene with the 
HIV envelope codon usage (rTHY-lenv) was generated using 
three 150 to 170 mer oligonucleotides (FIG. 7) . In 
contrast to the syngpl2 0mn gene, PCR products were 
directly cloned and assembled in pUC12, and subsequently 
35 cloned into pCDM7 . 
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Expression levels of native rrwv i 

«» hxv .„ v . lope codons were j;:™;] ■«« ™~ 

snows that the expression of the native thy-i „ 

l.vel of th. control transacted cells ,p CD M7, r„ 
contrast, expression of th. synthetic rat THY-i' J 

1. nLer, 8 ^ Chan " el 

oro»o, • T ° PrOV " "° ne9 " iVe Se9u,,n « 

pro.ot.ng degradation were inadvertently i„, M ._ 

- construct was generated in which th. rTHY-'i.™ "' 
cloned at th. 3' ,nrt m » rTHV-i.nv gen. was 

15 pan.l B) ll L synthetic gpl 20 gene (FIG . 

panel B) . in th ls experiment 293T cells u.,.. » 
with either the syngpliomn gen. or ^^T^'^ 

»«sur.d by immunopr.cipitation with 004 = 1,0 fusion 
prot. ln and prot . in A agaroM ^ Md " 

this experiment are described in greater detail below 
sine, the synthetic gpl 20 gen. has an UAG stop 
==don. rTHV-lenv is not translated from this transcript 

pres n ::; t :: e t h e r men " confarrin9 

>5 from thi 9P»° Protein levels expressed 

■5 fro* this construct should be decreased in comparison to 
the syngpi 20 mn construct without rTHV-lenv fig 9 

Zllt' T °< =onst; u « a is 

«»ll.r. indicating that the low expression must be 
linked to translation. 
0 Rev-d P n.n^ r ^ rr __ in|| . 

gene with envelop - ^r^""" 10 " <* £-aam 

To explore whether rev is able to regulate 
expression of a rat THY-l gene having env codons a 
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responsiveness of the a rat THY-lenv construct having a 
3' RRE, human 293T cells were cotransf ected 
ratTHY-lenvrre and either CDM7 or pCMVrev. At 60 hours 
post transfection cells were detached with 1 mM EDTA in 
5 PBS and stained with the 0X-7 anti rTHY-1 mouse 

monoclonal antibody and a secondary FITC-con jugated 
antibody. Fluorescence intensity was measured using a 
EPICS XL cytof luorometer. These procedures are described 
in greater detail below. 

10 In repeated experiments, a slight increase of 

rTHY-lenv expression was detected if rev was 
cotransf ected with the rTHY-lenv gene. To further 
increase the sensitivity of the assay system a construct 
expressing a secreted version of rTHY-lenv was generated. 

15 This construct should produce more reliable data because 
the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production 
over an extended period, in contrast to surface expressed 
protein, which appears to more closely reflect the 

20 current production rate. A gene capable of expressing a 
secreted form was prepared by PCR using forward and 
reverse primers annealing 3' of the endogenous leader 
sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The 

25 PCR product was cloned into a plasmid which already 
contained a CDS leader sequence, thus generating a 
construct in which the membrane anchor has been deleted 
and the leader sequence exchanged by a heterologous (and 
probably more efficient) leader peptide. 

30 The rev-responsiveness of the secreted form 

ratTHY-lenv was measured by immunoprecipitation of 
supernatants of human 293T cells cotransf ected with a 
plasmid expressing a secreted form of ratTHY-lenv and the 
RRE sequence in cis (rTHY-lenvPI-rre) and either CDM7 or 

35 pCMVrev. The rTHY-lenvPl-RRE construct was made by PCR 
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using the oligonucleotides 

cgcggggctagcgcaaagagtaataagtttaac as forward and 
cgcggatcccttgtattttgtactaata a as reverse priors and the 
synthetic rTHY-ienv construct as template. „ll r ^ 

TZTJl wit d h Nhcl and Notl the PCR fra ™ nt 

into a plasmid containing CD5 leader and rre sequences, 
supernatants of 35 s labfilled ^ ^ «• 

hours post transfection, precipitated with a mouse 
monoclonal antibody 0X7 against rTHY-i and anti mouse i gG 
10 sepharose, and run on a 12% reducing SDS-PAGE. 

in this experiment the induction of rTHY-ienv by 
rev was much more prominent and clearcut than in the 
a.ove-described experiment and strongly suggests that rev 
xs able to translationally regulate transcripts that are 
15 suppressed by low-usage codons. 
Rev- indep endent- oyppc^,- f 

fusion nm tp j n ' v " vl " r I, H Y~ 1 ff nv I 1 mmunoql oh) 

To test whether low-usage codons must be present 
throughout the whole coding sequence or whether a short 
20 regxon is sufficient to confer rev-responsiveness, a 
rTHY-ienv: immunoglobulin fusion protein was generated 
in this construct the rTHY-ienv gene (without the 
sequence motif responsible for phosphatidylinositol 
glvcan anchorage) is l inked to the huMn 

lit d ° nainS - ThlS C ° nSt - ct "« generated by anchor 

PCR using primers with Nhel and BamHI restriction sites 
and rTHY-ienv as template. The PCR fragment was cloned 
into a plasmid containing the leader sequence of the CDS 
surface molecule and the hinge, CH2 and CH3 parts of 
30 human IgGl immunoglobulin. A Hind3/Eagi fragment 
containing the rTHY-lenvegl insert was subsequently 
cloned into a pCDM7-derived plasmid with the RRE 
sequence . 


To measure the rp^^r^P ~ * 


, enveairrp *nr 
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pCDM7 or pCMVrev. The rTHY-lenveglrre construct was made 
by anchor PCR using forward and reverse primers with Nhel 
and BamHl restriction sites respectively. The PCR 
fragment was cloned into a plasmid containing a CD5 
5 leader and human IgGl hinge, CH2 and CH3 domains. 

Supernatants of 35 S labelled cells were harvested 72 hours 
post transf ection, precipitated with a mouse monoclonal 
antibody 0X7 against rTHY-1 and anti mouse IgG sepharose, 
and run on a 12% reducing SDS-PAGE. The procedures used 

10 are described in greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/ immunoglobulin fusion protein is secreted into 
the supernatant. Thus, this gene should be responsive to 
rev-induction. However, in contrast to rTHY-lenvPI-, 

15 cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-1 : immunoglobulin fusion 
protein with native rTHY-1 or HIV envelope codons was 
measured by immunoprecipitation. Briefly, human 293T 

2 0 cells transf ected with either rTHY-lenvegl (env codons) 
or rTHY-lwtegl (native codons) . The rTHY-lwtegl 
construct was generated in manner similar to that used 
for the rTHY-lenvegl construct, with the exception that a 
plasmid containing the native rTHY-1 gene was used as 

25 template. Supernatants of 35 S labelled cells were 

harvested 72 hours post transf ection, precipitated with a 
mouse monoclonal antibody 0X7 against rTHY-1 and anti 
mouse IgG sepharose, and run on a 12% reducing SDS-PAGE. 
THe procedures used in this experiment are described in 

30 greater detail below. 

Expression levels of rTHY-lenvegl were decreased 
in comparison to a similar construct with wildtype rTHY-1 
as the fusion partner, but were still considerably higher 
than rTHY-lenv. Accordingly, both parts of the fusion 

35 protein influenced expression levels. The addition of 
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r™v-x.„v did not r « triot mvtmim to 

« seen for rTHY-lenv alone. Thus , regulation^ Imv 

almost completely suppressed. 

5 g °"gn prffrntnet in htv-i .nv.^„. ^ rn n | 

Direct comparison between codon usage frequency of 

striking differene. for all twenty amino acid., on. 
simple measure of the statistic.! significance of this 

acids wxth two fold codon degeneracy, the favored third 
rescue is A or „ i„ all nine. Th . probability thai 
B °* tW ° "^iProbable choices will be the same is 
approximately 0.004. and hence by any conventional 
15 measure the third residue choice cannot be considered 

random. Further evidence of a skewed codon preference is 
found among the more degenerate codons. wherf a strong 
selection for triplets bearing adenine can be seen. This 
contrasts with th. pattern for highly expressed gene. 
" wh 1C h favor codons bearing c. or less commonly o, L^e 
third position o, codon. with three or more 'old 
degeneracy. 

The systematic exchange of native codons with 
codons of highly expressed human genes dramatically 

25 increased expression of gpi20 A a „. n Hf 

yHA<u< A quantitative analysis 
by ELISA showed that expression -k naiysis 

, expression of the synthetic gene was 

at least 25 fold higher in comparison to native gpl20 
after transient transfection into human 293 cells The 

30 ra 0 tr nt r ti0n l6VelS ^ ELISA -*P«*"nt shovn were 

rather low. since an ELISA was used for quantification 
which is based on gp 120 binding to CD4 , only native, non- 
denatured material was detected. This may explain the 
apparent low expression. Measurement of cytoplasmic mRNA 
i eve Is dpr^nc^^ivr^ * v 


WO 96/09378 


PCT/US95/11511 


- 24 - 

expression is due to translational differences and not 
xnRKA stability. 

Retroviruses in general do not show a similar 
preference towards A and T as found for HIV. But if this 
5 family was divided into two subgroups, lentiviruses and 
non-lentiviral retroviruses, a similar preference to A 
and, less frequently, T, was detected at the third codon 
position for lentiviruses. Thus, the availing evidence 
suggests that lentiviruses retain a characteristic 
10 pattern of envelope codons not because of an inherent 

advantage to the reverse transcription or replication of 
such residues, but rather for some reason peculiar to the 
physiology of that class of viruses. The major 
difference between lentiviruses and non-complex 
15 retroviruses are additional regulatory and non- 
essential^ accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the 
restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional 
20 molecules is based on it. In fact, it is known that one 
of these proteins, rev, which most likely has homologues 
in all lentiviruses. Thus codon usage in viral mRNA is 
used to create a class of transcripts which is 
susceptible to the stimulatory action of rev. This 

2 5 hypothesis was proved using a similar strategy as above, 

but this time codon usage was changed into the inverse 
direction. Codon usage of a highly expressed cellular 
gene was substituted with the most frequently used codons 
in the HIV envelope. As assumed, expression levels were 

3 0 considerably lower in comparison to the native molecule, 

almost two orders of magnitude when analyzed by 
immunofluorescence of the surface expressed molecule (see 
4.7). If rev was coexpressed in trans and a RRE element 
was present in cis only a slight induction was found for 
3 5 the surface molecule. However, if THY-1 was expressed as 
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« s.cr.t.d ■ol.cul., the induction by r.v was .„c - 
P~.in.nt, supporting the above hypa l. si ;"Z:\T 
Pro b . b ly b . mxplalnM by accUBUlation ot ' «" 

5 effect Tr.""',' UhlCh ~~"««'* -P"«~ tL r.v 

mo^,' ° y ndU= " ' " ,in ° r in """ surf.c. 

MlKUl " ln "Auction of HIV envelop. by r I' 

cannot h.v. t h . purpose of an increased .urfaH 
..unaanc. b ut rather of an increased intracelluUr 

» ^ ^ ::r tely — - - — 

To test whether small subtotal .l.m.„ts of a gen. 
are sufficient to restrict expression and render it r.T 

'"'""■"™unoglo b ulin fusion proteins u.r. 

generated, in which only . bou t on. third of th. tot.7 

!*"• thB «*« ""a- Expression i££ ot 

tha" t C h nS T Ct °" " inte ™ edi "« 1-v.l. indicating 

that th. rTHY-lenv negative sequence element is not 

dominant ov.r th. immunoglobulin part. This fusion 

protein was not or only slightly r.v-r.sponsiv.. 

caToT * ° nly 9 " eS al "°" suppressed 

can be rev-responsive. 

Another characteristic feature that was found in 
the codon frequency tables is a striking 

underrepresentation of epe triolets m » 
-, c . H triplets. in a comparative 

25 study 0 f codon usage in E> eellf yeast> drosoph P iia J 

privates it was shown that in a high number of analyzed 

wltTth Tr\ th& 8 1CaSt COd ° nS COntai " ali 

with the CpG dmucleotide sequence. Avoidance of codons 

containing this dinucleotide motif was also found in the 
30 sequence of other retroviruses. It seeffis plausible ^ 
the reason for underrepresentation of CpG-bearing 
triplets has something to do with avoidance of gene 
silencing by methylation of CpG cytosines. The expected 
number of CpG dinurie^Hp, 
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composition. This might indicate that the possibility of 
high expression is restored, and that the gene in fact 
has to be highly expressed at some point during viral 
pathogenesis. 

5 The results presented herein clearly indicate that 

codon preference has a severe effect on protein levels, 
and suggest that translational elongation is controlling 
mammalian gene expression. However, other factors may 
play ar role. First, abundance of not maximally loaded 

10 mRNA's in eukaryotic cells indicates that initiation is 
rate limiting for translation in at least some cases, 
since otherwise all transcripts would be completely 
covered by ribosomes. Furthermore, if ribosome stalling 
and subsequent mRNA degradation were the mechanism, 

15 suppression by rare codons could most likely not be 
reversed by any regulatory mechanism like the one 
presented herein. One possible explanation for the 
influence of both initiation and elongation on 
translational activity is that the rate of initiation, or 

2 0 access to ribosomes, is controlled in part by cues 

distributed throughout the RNA, such that the lentiviral 
codons predispose the RNA to accumulate in a pool of 
poorly initiated RNAs. However, this limitation need not 
be kinetic; for example, the choice of codons could 

2 5 influence the probability that a given translation 

product, once initiated, is properly completed. Under 
this mechanism, abundance of less favored codons would 
incur a significant cumulative probability of failure to 
complete the nascent polypeptide chain. The sequestered 

30 RNA would then be lent an improved rate of initiation by 
the action of rev. Since adenine residues are abundant 
in rev-responsive transcripts, it could be that RNA 
adenine methylation mediates this translational 
suppression. 
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Detailed Pr-^ ftilirg{| 

The following procedures were used in the above- 
described experiments. 
Sequence ^nffllYffjff 
5 Sequence analyses employed the software developed 

by the University of Wisconsin Computer Group. 
Plasmid cons truct ions 

Plasmid constructions employed the following 
methods. Vectors and insert DNA was digested at a 
10 concentration of 0.5 /*g/10 M i in the appropriate 

restriction buffer for 1 - a hours (total reaction volume 
approximately 3 0 M l) . Digested vector was treated with 
io% (v/v) of 1 M g/ml calf intestine alkaline phosphatase 
for 30 min prior to gel electrophoresis. Both vector and 
15 insert digests (5 to 10 m each) were run on a 1.5% low 
melting agarose gel with TAE buffer. Gel slices 
containing bands of interest were transferred into a 1.5 
ml reaction tube, melted at 65-C and directly added to 
the ligation without removal of the agarose. Ligations 
20 were typically done in a total volume of 25 nl in lx Low 
Buffer lx Ligation Additions with 200-400 U of ligase, l 
Ml of vector, and 4 M i of insert. When necessary, 5' 
overhanging ends were filled by adding i/io volume of 250 
MM dNTPs and 2-5 U of Klenow polymerase to heat 
25 inactivated or phenol extracted digests and incubating 
for approximately 20 min at room temperature. When 
necessary, 3' overhanging ends were filled by adding l/io 
volume of 2.5 mM dNTPs and 5-10 U of T4 DNA polymerase to 
heat inactivated or phenol extracted digests, followed by 
30 incubation at 37 »C for 30 min. The following buffers 

were used in these reactions: iox Low buffer (60 mM Tris 
HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl , 4 mg/ml BSA, 70 mM 
0-mercaptoethanol, 0.02% NaN 3 ) ; lOx Medium buffer (60 mM 
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mM Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl f 4 mg/ml 
BSA, 70 mM 0-mercaptoethanol , 0.02% NaN 3 ) ; lOx Ligation 
additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 mM 
spermidine) ; 50x TAE (2 M Tris acetate, 50 mM EDTA) . 
5 Oligonucleotide synthesis and purification 

Oligonucleotides were produced on a Milligen 8750 
synthesizer (Millipore) . The columns were eluted with l 
ml of 30% ammonium hydroxide, and the eluted 
oligonucleotides were deblocked at 55°C for 6 to 12 

10 hours. After deblockiong, 150 til of oligonucleotide were 
precipitated with lOx volume of unsaturated n-butanol in 
1.5 ml reaction tubes, followed by centrif ugation at 
15,000 rpm in a microfuge. The pellet was washed with 
70% ethanol and resuspended in 50 m! of H 2 0. The 

15 concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:333 (1 OD 260 = 30 
Atg/ml) . 

The following oligonucleotides were used for 
construction of the synthetic gpl20 gene (all sequences 
20 shown in this text are in 5' to 3' direction). 

oligo 1 forward (Nhel) : cgc ggg eta gec acc gag 
aag ctg (SEQ ID NO: 1). 

oligo l: acc gag aag ctg tgg gtg acc gtg tac tac 
ggc gtg ccc gtg tgg aag ag ag gec acc acc acc ctg ttc tgc 

2 5 gec age gac gec aag gcg tac gac acc gag gtg cac aac gtg 

tgg gec acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag 
gag gtg gag etc gtg aacgtg acc gag aac ttc aac atg (SEQ 
ID NO: 2) . 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt 

3 0 tga agt tct c (SEQ ID NO: 3) . 

oligo 2 forward: gac cga gaa ctt caa cat gtg gaa 
gaa caa cat (SEQ ID NO: 4) 

oligo 2: tgg aag aac aac atg gtg gag cag atg cat 
gag gac ate ate age ctg tgg gac cag age ctg aag ccc tgc 
35 gtg aag ctg acc cc ctg tgc gtg acc tg aac tgc acc gac ctg 
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= = = = = ::: = - - - - ■ - - - 

oligo 2 reverse rPsti \ - 
« cat etc g cc gcc ctt (SEQ ( 7 D " N ' o! *« 9Ct ,=a gtt ctt 

oligo 3 forward rPsti* • 
oligo 3: aac ate acc acc asn 

- - - - - «. - - r 

~ ~ - »= - - t 9 ;: :n ::: :i ;:: ::: 

ID NO: 8). www ywc 

oligo 3 reverse: gaa c » t „„_ 
15 ggc ggg (SEQ ID NO: 9, . ' ,9= " C 9 " 9=c 

9 gca acg aca aga agt to (SEO id no: lo) 
t,= aag Vac'Vg JT "° " 9 "° a * C »* «c age 

- cc gt g 9 z tn : i «• ^ •« - - , te cgg 
= r:i r:i ;: - = « = = = = = 

(SEQ x" no: ^ " °" 989 ^ "= 

» - - r?x »' i^r - - - - 

* «cg 0 ogt 9 U !T V„ "° 9t9 - ~ 

ol^go 5: aac tgc acg C gt ccc aac tac aac a* 0 
aag cgc ate cac ate ggc ccc ggg cgc gcc ttc " * 9C 

30 aaq aac atr 9 tac acc a <=c 

g ate ate gge aee ate etc eag g CC cac tgc aac ate 

tct aga (SEQ ID NO: 14) y atc 

oligo 5 reverse: gte gtt cca ctt ggc tct aaa M i- 
gtt gca (SEQ id NO: 15). 9 93t 
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oligo 6: gcc aag tgg aac gac acc ctg cgc cag ate 
gtg age aag ctg aag gag cag ttc aag aac aag acc ate gtg 
ttc ac cag age age ggc ggc gac ccc gag ate gtg atg cac 
age ttc aac tgc ggc ggc (SEQ ID NO: 17) . 
5 oligo 6 reverse (EcoRl) : gca gta gaa gaa ttc gcc 

gcc gca gtt ga (SEQ ID NO: 18). 

oligo 7 forward (EcoRl) : tea act gcg gcg gcg aat 
tct tct act gc (SEQ ID NO: 19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc 
10 ctg ttc aac age acc tgg aac ggc aac aac acc tgg aac aac 
acc acc ggc age aac aac aat att acc etc cag tgc aag ate 
aag cag ate ate aac atg tgg cag gag gtg ggc aag gcc atg 
tac gcc ccc ccc ate gag ggc cag ate egg tgc age age (SEQ 
ID NO: 20) 

15 oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 

acc gga tct ggc cct c (SEQ ID NO: 21) . 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag 
caa cat cac egg tct g (SEQ ID NO: 22) . 

oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac 

2 0 ggc ggc aag gac acc gac acc aac gac ace gaa ate ttc cgc 

ccc ggc ggc ggc gac atg cgc gac aac tgg aga tct gag ctg 
tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gcc 
ccc acc aag gcc aag cgc cgc gtg gtg cag cgc gag aag cgc 
(SEQ ID NO: 23) . 
25 oligo 8 reverse (Notl) : cgc ggg egg ccg ctt tag 

cgc ttc teg cgc tgc acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 

oligo 1 forward (BamHl/Hind3 ) : cgc ggg gga tec 

3 0 aag ctt acc atg att cca gta ata agt (SEQ ID NO: 25). 

oligo 1 : atg aat cca gta ata agt ata aca tta tta 
tta agt gta tta caa atg agt aga gga caa aga gta ata agt 
tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga cat gaa aat aat aca aat ttg cca ata caa eat gaa ttt 
3 5 tea tta acg (SEQ ID NO: 26). 
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oligo i reverse (EcoRi/Mim i . 
cgt taa tea aaa «. (ECoRl/Mlul > : <39g gaa ttc acg 

aa tga aaa ttc atg ttg (SEQ id NO: 27) 

oligo 2 forward (BamHl/Mlul) : C gc aaa tM > 
gaa aaa aaa aaa cat (SEQ id N0: 28 , . 9 tCC acff Cgt 

5 oligo 2: cgt gaa aaa aaa aaa cat gta tta 

aca tta gga gta cca gaa cat aca ta t * 9 "* 

^ r - - - - - - « Lt ::; :;: ::; - - 

10 oligo 2 reverse (EcoRi/saci) • coc «.«. 

aca rat *. _ oc caa ttc gag etc 

aca cat ata ate tec (SEQ ID NO: 30). 

oligo 3 forward (BamHl/Saci) : — <. 

aga gta agt gga caa ( SE Q id NO: 31)' ™" ^ ^ **' 

tta «a tta agt tta acrt- 4.4. 

«t , ta agt tta t9a (S % Q ID x 3 " °" 9ca a - 9 " 
« ,ct t„ ztz ( T R r oti> ■■ ■*= «- « ** 

aCt tat aaa at c (SEQ ID NO: 33). 
Short, overlapping 15 to 0* 1 • 

« — -/ w ; re u .:° d 2 t 5 0 x^rr r„v des 

olx,onucloti d .. by polyD . rasa chain re : ctio y n t ^ ) ° n9 
" Typical pcr conditions were: 35 cycles ss-c Z ,' • 

to generate longer framo^fc 4 
, . . 9 r fragments consisting of two 

advent s.aU fragments . Th . se 1q fragBents ^ 

sel ' CDH '- d - iv « PX—1- contain!^ a lead." 

Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 

Th. following solutions were used in , h ,„ 


re* ~< 
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was complemented with 10% DMSO to increase fidelity of 
the Taq polymerase. 

Small scale DNA preparation 

Transformed bacteria were grown in 3 ml LB 
5 cultures for more than 6 hours or overnight • 

Approximately 1.5 ml of each culture was poured into 1.5 
ml microfuge tubes, spun for 2 0 seconds to pellet cells 
and resuspended in 200 Ail of solution I. Subsequently 
400 Ml of solution II and 300 fil of solution III were 

10 added. The microfuge tubes were capped, mixed and spun 
for > 30 sec. Supernatants were transferred into fresh 
tubes and phenol extracted once. DNA was precipitated by 
filling the tubes with isopropanol, mixing, and spinning 
in a microfuge for > 2 min. The pellets were rinsed in 

15 70 % ethanol and resuspended in 50 ^1 dH20 containing 10 
Ail of RNAse A. The following media and solutions were 
used in these procedures: LB medium (1.0 % NaCl, 0.5% 
yeast extract, 1.0% trypton) ; solution I (10 mM EDTA pH 
8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution III 

20 (2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 

adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HC1, 
pH 7.5, 1 mM EDTA pH 8.0). 
Large scale DNA preparation 

One liter cultures of transformed bacteria were 

2 5 grown 24 to 36 hours (MC1061p3 transformed with pCDM 

derivatives) or 12 to 16 hours (MC1061 transformed with 
pUC derivatives) at 37 °C in either M9 bacterial medium 
(pCDM derivatives) or LB (pUC derivatives) . Bacteria 
were spun down in 1 liter bottles using a Beckman J6 

30 centrifuge at 4,200 rpm for 20 min. The pellet was 

resuspended in 40 ml of solution I. Subsequently, 80 ml 
of solution II and 40 ml of solution III were added and 
the bottles were shaken semivigorously until lumps of 2 
to 3 mm size developed. The bottle was spun at 4,200 rpm 

35 for 5 min and the supernatant was poured through 
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25 


30 


-in. The pellet was resuspend.d in 4 i „, „, f . 
and added to 4.5 g of c.sil chloride 0 £ 'J 1 """ 1 
5 ethidium bromide, and o.i ml of 1% Triton xTn„ " g/Bl 
The tubes w.r. • Triton Xloo solution. 

into BecJ^uic* sTl'uZ SUP " n "" nt «" tr.„s f e rr . d 
" « Uic * Seal ultracentrifuge tubes U h<^ 

then sexiaH . > ' tuoes, which were 

Wen seale <* and spun in a Beckman ultrac-ntr^ 

0 TTbVr an9i# rotor " 8o - oo ° ^::T 2 T ho Tr a 

The band was extracted by visible lirrh*. , • nours - 

1 51o;e iight using a l mi 
syringe and 20 gauge needle. An eoU al «,J „ A! _ ... 
"ded to the extracted material . DNA wa, TxlrlLl?" "" 
with n-butanol saturated with ! m • s once 
. , J wltn 1 « sodium chloride 

5 followed by addition of an equal volume of lo H ammoni 
acetate/ 1 mH edta tk- ammonium 
■un tuiA. The material was poured into > , •. , 
snap tube which was tehn filled to th. Z 
eth«n„i .. rilled to the top with absolute 

10 cZ 1 SPUn in " B * CltMn » centrifuge at 

«hanol and resuspended in 0.5 to l ml of H 2 0. The DNA 

:™::to uas deterained ty - 

M,/mlK " 8 d " Uti0n ° f 1:200 » ° D »» - =° 

The following media and buffers were used in th... 
pro=e dures M9 oacterU1 MdiM uo 9 M9 

Z7l l\ '"Xdrolysed, . 10 ml m additions. 7.1 

M 9 /ml tetracycline ,500 Ml of a ls ng/Bl stock solutiQ 
12-5 Mg/»1 ampicillin (l 25 Ml of a 10 ° n > ■ 

solution,; M9 additions (1 „ « C aci 2 . 100 „ 

Mg/ml thiamine, 70% qlvceroiw m «^ • , 

r w-% yxyceroi) ; LB medium (l.o % Nari n * 

» m K0A s :Tr: 11 (o - j m NaoH i -° * s ° s,; s ^"°» *» 

^.5 M KOAc 2.5 M HOAc) 
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Sequencing 

Synthetic genes were sequenced by the Sanger 
dideoxynucleotide method. In brief, 20 to 50 fig double- 
stranded plasmid DNA were denatured in 0.5 M NaOH for 5 
5 min. Subsequently the DNA was precipitated with 1/10 
volume of sodium acetate (pH 5.2) and 2 volumes of 
ethanol and centrifuged for 5 min. The pellet was washed 
with 70% ethanol and resuspended at a concentration of 1 
fiq/fil* The annealing reaction was carried out with 4 fig 

10 of template DNA and 40 ng of primer in lx annealing 
buffer in a final volume of 10 fil. The reaction was 
heated to 65°C and slowly cooled to 37 °c. In a separate 
tube 1 pi of 0.1 M DTT, 2 fil of labeling mix, 0.75 fil of 
dH 2 0, 1 fil of [ 35 S] dATP (10 uCi) , and 0.25 fil of 

15 Sequenase* (12 U/fil) were added for each reaction. Five 
Ail of this were added to each annealed primer- 

template tube and incubated for 5 min at room 
temperature. For each labeling reaction 2.5 fil of each 
of the 4 termination mixes were added on a Terasaki plate 

2 0 and prewarmed at 3 7 °C. At the end of the incubation 

period 3.5 ^tl of labeling reaction were added to each of 
the 4 termination mixes. After 5 min, 4 fil of stop 
solution were added to each reaction and the Terasaki 
plate was incubated at 80°C for 10 min in an oven. The 

2 5 sequencing reactions were run on 5% denaturing 

polyacrylamide gel. An acrylamide solution was prepared 
by adding 200 ml of lOx TBE buffer and 957 ml of dH 2 0 to 
100 g of acrylamide : bisacrylamide (29:1). 5% 
polyacrylamide 4 6% urea and lx TBE gel was prepared by 

30 combining 38 ml of acrylamide solution and 28 g urea. 

Polymerization was initiated by the addition of 400 fil of 
10% ammonium peroxodisulf ate and 60 fil of TEMED. Gels 
were poured using silanized glass plates and sharktooth 
combs and run in lx TBE buffer at 60 to 100 W for 2 to 4 

35 hours (depending on the region to be read) . Gels were 


NSDOCID: <WO 9609378A1 _l_> 


WO 96/09378 

PCT/US95/11511 

- 35 - 

transferred to Whatman blotting paper dried 
about l hour paper, dried at eo-c for 

x nour, and exposed to x-rav fn« 

following solutions were used in " " hOUrs - Th « 

dTTP) ; Termination Mixls so I« „ ^ d0TP '" d 

(one each)); ^'X;r ( .T;JL""" cl - 8 

EWA, 0.05 % broaphenol blue o o * ™, f ° rDa "' lde ' ™ ™ 

(»«•* 9 P°lya=r y ia n ide 3 To bi * * """" SOl " i0n 
TBE, 957 »1 d „ 20) ' 9 bi "«yl-ide. 200 Di lx 

Cytoplasmic RNA was isoi * 

15 Phosphate transacted 2«T cel la 36 h 

^. ceils 36 hours post 

20 ™ , ° UrP#nt Prot °cols in Molecular 

20 Biology, Ausubel et al eds wn , "°*«cuiar 

iog,, - • , ' Wlle y & Sons, New York 

to 0.2% and 0 2 ma/ m Proteinase K were added 

extract respectively. The cytoplasmic 

extracts were xncubated at 37-c for 20 min 
25 Phenol/chloroform extracted twice an n 

- w :: di oi ved in ioo ul rjnzz?- n The 

- «::^«:;^^i:;::::ev::air pped - — - 

The following solutions were used in this 
procedure: Ly sis Buffer (TE containing „ ith sol Tr is p „ 
8-0 100 W NaCl. 5 « M,C1 2 , 0 .5% NP40, ; Buffer I^TE 

RNAse inhibitor, o.l u/ M l RNAse free DNAse u- 
buffer (50 *M EDTA 1.5 M NaOAr , n » . ^ St ° P 


30 
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plot: blot analysis 

For slot blot analysis 10 ixg of cytoplasmic RNA 
was dissolved in 50 Ml dH 2 0 to which 150 Ml of lOx 
SSC/18% formaldehyde were added. The solubilized RNA was 
5 then incubated at 65 °C for 15 min and spotted onto with a 
slot blot apparatus. Radioactively labelled probes of 
1.5 kb gpl20IIIb and syngpl2 0mn fragments were used for 
hybridization. Each of the two fragments was random 
labelled in a 50 Ml reaction with 10 Ml of 5x oligo- 

10 labelling buffer, 8 /il of 2.5 mg/ml BSA, 4 /xl of -[ 32 P]- 
dCTP (20 uCi/Ml; 6000 Ci/mmol) , and 5 U of Klenow 
fragment. After 1 to 3 hours incubation at 37°C 100 Ml 
of TE were added and unincorporated «[ 32 P]-dCTP was 
eliminated using G50 spin column. Activity was measured 

15 in a Beckman beta-counter, and equal specific activities 
were used for hybridization. Membranes were pre- 
hybridized for 2 hours and hybridized for 12 to 24 hours 
at 42 °C with 0.5 x 10 6 cpm probe per ml hybridization 
fluid. The membrane was washed twice (5 min) with 

20 washing buffer I at room temperature, for one hour in 
washing buffer II at 65°C, and then exposed to x-ray 
film. Similar results were obtained using a 1.1 kb 
Notl/Sfil fragment of pCDM7 containing the 3 untranslated 
region. Control hybridizations were done in parallel 

2 5 with a random-labelled human beta-actin probe. RNA 

expression was guantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics 
phosphor imager . 

The following solutions were used in this 

3 0 procedure: 

5x Oligo-labelling buffer (250 mM Tris HC1, pH 8.0, 25 mM 
MgCl 2 , 5 mM 0-mercaptoethanol , 2 mM dATP, 2mM dGTP, 2mM 
dTTP, 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides [dNTP]6); 

Hybridization Solution ( M sodium phosphate, 250 mM 

3 5 NaCl, 7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50% 
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forbid., 100 M g/»1 denatured salmon sperzn DNA) . washin 
buffer I (2x SSC, ' Washln 9 

0.1% SDS); washing buffer II (0 5x SSC n , » 

SSC ft m m.o-. - 1 ,ax SSC ' °«1 * SDS); 20x 

SSC a „ NaC1 , „ Na3citrate# adjusted 

«.„ VaCCinia "combination used a edification of the 
Seed d6SCribed ^ ROme ° " d «— C*~o U 

90% confluency were infected with l m , - 

Wlth ° Ut « lf "ter 24 hours, the eel 1 

were tr.nsfected by calciu- phosphate j? 

tT:.u s DKA per dish - After an additi °" ai 24 «- 

tne cells were scraped off the D i,f. a 
» "suspended in . vo L. of f ^1 "^.H f 

cvcl.. f^™- After 3 freeje/thaw 

12 l\ 7 added " °-° 5 were 

incubated for 20 -in. A dilution series of L i and o i 

»1 Of this ly.st. W«S U ,.d t „ in 

of cvi cells, that had been pretr.atd "iV <<J> 
™ aycophenolic acid. 0.2= » 9/Kl xanthi „ „ a J ^ '« 

"I 6 hOU "- Infe "* d «« « " red 

2 to 3 days, anfl subsequently sta . ned 

IZITT 1 anti "° dy """" " ain " 9Pl2 ° •» "^line 

Phosphatase con ju ,ated secondary antibody. ceils were 
» --bated With 0.33 B , /m l NBT and „ /Bl BCIp Z l- 

PosiT and / inaUy — « >°"ose in PBS 

Tr.s P H 9.0. The plaque purification was repeated once 
To produce hi,h titer stocks the infection was slowly 
0 scaled up. Finally, one large plate of Hela cells was 

InfeTd 1 ° f VirUS ° f P " Vi °- «"»-. 

infected cells were detached in 3 .1 o, PBS , lysed with a 

Dounc. ho B o,.ni,er and cleared fro* lar g er debris by 
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and express HIV-l IIIB gpl20 under the 7.5 mixed 
early/late promoter (Earl et al., J. Virol., 65:31, 
1991) . In all experiments with recombinant vaccina cells 
were infected at a multiplicity of infection of at least 
5 10. 

The following solution was used in this procedure: 
AP buffer (100 mM Tris HC1, pH 9.5, 100 mM NaCl, 5 mM 
MgCl 2 ) 

Cell culture 

10 The monkey kidney carcinoma cell lines CVl and 

Cos7, the human kidney carcinoma cell line 293T, and the 
human cervix carcinoma cell line Hela were obtained from 
the American Tissue Typing Collection and were maintained 
in supplemented IMDM. They were kept on 10 cm tissue 

15 culture plates and typically split 1:5 to 1:20 every 3 to 
4 days. The following medium was used in this 

procedure: 

Supplemented IMDM (90% Iscove ' s modified Oulbecco Medium, 
10% calf serum, iron-complemented, heat inactivated 30 
20 min 56°C, 0.3 mg/ml L-glutamine, 25 nq/ml gentamycin 0.5 
mM £-mercaptoethanol (pH adjusted with 5 M NaOH, 0.5 
ml) ) . 

Trangf eptipn 

Calcium phosphate transfection of 293T cells was 
25 performed by slowly adding and under vortexing 10 fig 

plasmid DNA in 250 jxl 0.25 M CaCl 2 to the same volume of 
2x HEBS buffer while vortexing. After incubation for 10 
to 3 0 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In 
30 cotransf ection experiments with rev, cells were 
transfected with 10 Mg gpl20IIIb, gpl20IIIbrre, 
syngp!20mnrre or rTHY-lenveglrre and 10 Mg of pCMVrev or 
CDM7 plasmid DNA. 
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The following solutions were used in this 
procedure: 2x HEBS buffer (280 mM NaCl, 10 ^ Kcl lSaM 
sterile filtered); 0.25 *M CaCl 2 (autoclaved) . 
Iimnun < -m,-. r j r i t»f < ^ n 

S After 48 to so hours medium was exchanged and 

cells were incubated for additional 12 hours in Cys/Met- 
free medium containing 200 „ci of "s-translaoei 
Supern.tants „. r . harvested and spun for is min at 3000 
rpm to remove debris. After addition of protease 
10 inhibitors leupeptin. aprotinin and PMSF to 2.5 Hg/ „i 50 
«g/ml. 100 ug/ml respectively, 1 ml of supernatant was 

::r::i: i r h eith - 10 « °< A seph ;: ose 

«".'f '""'- i " Ve91r "' « "ith protein A sepharose and 3 
= a purged CD4/im,„unoglobulin fusion protein 

4 ( "r f or r h Vide " ^ Behrin9> Wl " «"«ructs, « 

4 C for 12 hours on a rotator. Subsequently the protein 

A beads were washed S times for 5 to is min each time. 

was LTl 10 Ul °' l0adi "' bU "« staining 

20 « , T SMPleS b °" ed t0r 3 nin — WW- =" 

" 7* (all gpl 20 constructs, or 10, <rTHY-lenveglrre, SDS 

TR I !V elS <TRIS PH bU " er in "» "solving, 

TRIS pH 6.8 buffer i„ the stacking gel, TRIS-g ly cin 

running buffer, Maniatis et al. 198S) . eels were fixed 
in 10% acetic acid and 10 % methanol, incubated with 
« Amplify for 20 Bin< dri . d and expoMd ^ 

The following buffers and solutions were used in 
this procedure: Wash buffer ( ioo »M Tris, p„ 7 . 5 , 
Naci, 5 mM c.ci,, n hp-40,; 5x Running Buffer (125 mH 
Tris, 1.25 M Glycin, 0.5* SDS); Loading buffer (10 % 
30 glycerol, 4* SDS. 4% *-mercapt=ethanol , 0.02 » bromph.nci 

Immunof in ?r o Scene<> 

293T cells were transfected by calcium phosphate 
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wart stained with the monoclonal antibody OX-7 in a 
dilution of 1:250 at 4<>c for 20 min, washed with PBS and 
subsequently incubated with a 1:500 dilution of a FITC- 
conjugated goat anti-mouse immunoglobulin antiserum, 
5 Cells were washed again, resuspended in 0.5 ml of a 
fixing solution, and analyzed on a EPICS XL 
cytof luorometer (Coulter) . 

The following solutions were used in this 
procedure : 

10 PBS (137 mM NaCl, 2.7 mM KC1, 4.3 mM Na 2 HP0 4 , 1.4 mM 
KH 2 P0 4 , P H adjusted to 7.4); Fixing solution (2% 
formaldehyde in PBS) . 
SLISh 

The concentration of gpl20 in culture supernatants 

15 was determined using CD4-coated ELISA plates and goat 
anti-gpl20 antisera in the soluble phase. Supernatants 
of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3 000 rpm for 10 min to 
remove debris and incubated for 12 hours at 4«C on the 

20 plates. After 6 washes with PBS 100 jil of goat anti- 

gpl20 antisera diluted 1:200 were added for 2 hours. The 
plates were washed again and incubated for 2 hours with a 
peroxidase-conjugated rabbit anti-goat IgG antiserum 
1:1000. Subsequently the plates were washed and 

25 incubated for 30 min with 100 Ml of substrate solution 
containing 2 mg/ml o-phenylenediamine in sodium citrate 
buffer. The reaction was finally stopped with 100 ul of 
4 M sulfuric acid. Plates were read at 490 nm with a 
Coulter microplate reader. Purified recombinant 

30 gpl20IIIb was used as a control. The following buffers 
and solutions were used in this procedure: Wash buffer 
(0.1% NP40 in PBS); Substrate solution (2 mg/ml o- 
phenylenediamine in sodium citrate buffer) . 
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Use 

The synthetic aenes of • 
for expressing *-k invention are useful 

r expressing the a protein normally expressed in 
mammalian cells in cell culture (e a tZ 
» Production of human proteins such as hCH tTT'^ 
VII, and Factor XX, . The synthetic genes of ^he 
invention are also useful for gene therapy ^ 
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SEQUENCE LISTING 


(1) GENERAL INFORMATION: 

(i) APPLICANT: SEED, BRIAN 

(ii) TITLE OF INVENTION: OVEREXPRESSION OF MAMMALIAN AND VIRAL 

PROTEINS 

(iii) NUMBER OF SEQUENCES: 37 

( iv) CORRESPONDENCE ADDRESS i 

(A) ADDRESSEE t Fish & Richardson 

(B) STREET: 225 Franklin Street 

(C) CITY: Boston 

<D) STATE: Massachusetts 

( E ) COUNTRY: U.S.A. 

(F) ZIP: 02110-2804 

<v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30B 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/308,286 

(B) FILING DATE: 19-SEP-1994 

<viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: CLARK, PAUL T 

(B) REGISTRATION NUMBER: 30,162 

<C) REFERENCE/ DOCKET NUMBER: 00786/226001 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: {617) 542-8906 

(C) TELEX: 200154 


(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CGCGCCCTAG CCACCGAGAA GCTG 24 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
ACCGAGAAGC TGTCCGTGAC CCTCTACTAC CCCCTCCCCC TGTGCAAGAG ACCCCACCAC 
CACCCTGTTC TCCCCCACCC ACCCCAACCC GTACCACACC GACGTGCACA ACGTGTGCCC 
CACCCACCCC TGCGTGCCCA CCCACCCCAA CCCCCAGGAG GTGGAGCTCG TGAACGTGAC 
CGAGAACTTC AACATC 

(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 baa* pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


60 
120 
180 
196 


(xi) SEQUENCE DESCRIPTION: SEQ ID 
CCACCATCTT GTTCTTCCAC ATCTTGAACT TCTC 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GACCGAGAAC TTCAACATGT GGAAGAACAA CAT 

33 

(2) INFORMATION FOR SEQ ID NO: 5: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TGGAAGAACA ACATGCTGCA GCAGATGCAT CAGCACATCA TCAGCCTGTG GGACCAGAGC 60 
CTGAAGCCCT GCCTGAAGCT GACCCCCTGT GCGTGACCTG AACTGCACCG ACCTGAGGAA 
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GCCGGCGAGA TG 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH * 33 baa* pair a 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY i linaar 


(xi) SEQUENCE DESCRIPTION x SEQ ID NO161 
CTTGAAGCTG CAGTTCTTCA TCTCGCCGCC CTT 33 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 baaa paira 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: a ingle 

(D) TOPOLOGY: linaar 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GAAGAACTGC AG CTTCAACA TCACCACCAG C 
(2) INFORMATION TOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 195 baaa paira 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: eingla 

(D) TOPOLOGY: linaar 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AACATCACCA CCACCATCCG CCACAAGATG CAGAAGCAGT ACGCCCTGCT CTACAAGCTG 60 

GATATCGTGA GCATCGACAA CGACAGCACC AGCTACCGCC TGATCTCCTG CAACACCAGC 120 

GTGATCACCC AGGCCTGCCC CAAGATCAGC TTCGAGCCCA TCCCCATCCA CTACTGCGCC 180 

CCCGCCGGCT TCGCC 19 5 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 baaa paira 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: aingla 

(D) TOPOLOGY: linaar 
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(*i> SEQUENCE DESCRIPTION: SEQ ID NO:9: 
GAACTTCTTC TCCCCCCCCA ACCCGCCGCC 

(2) INFORMATION FOR SEQ ID NO: 10 i 30 
(i) SEQUENCE CHARACTERISTICS: 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS : aingla 

(D) TOPOLOCY: linaar 


(xi, SEQUENCE DESCRIPTION: SEQ ID NO:10: 
CCCCCCCCGC CCCCTTCCCC ATCCTCAACT CCAACCACAA CAACTTC 
(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTER 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: .ingla 

(D) TOPOLOCY: linaar 


47 


60 
120 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO- 11- 

ZZIT MTTCASCM e " a " e " e ™™ — - 

C " CC °° CCTCTOST ~««« * »c=CC MC CT 

«C M GAACTTCACC „c«CCC=X ,„CCXTC„ CCT.C.CCTC llc 
AATCAGACCC TCCACATC 

(2) INFORMATION FOR SEQ ID NO: 12: 198 

U) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 34 baa* pair a 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: ainala 

(D) TOPOLOGY: linaar 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 : 
ACTTGCGACC CGTCCACTTG ATCTCCACGC TCTC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 30 baaa pair. 

(B) TYPE: nuelaie acid 


34 
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(xi) SEQUENCE DESCRIPTION: SEQ ZD NO;13i 
GAGAGCGTGC AGATCAACTG CACGCGTCCC 30 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 120 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AACTGCACGC GTCCCAACTA CAACAAGCGC AAGCGCATCC ACATCGGCCC CGGGCGCGCC 60 
TTCTACACCA CCAAGAACAT CATCGGCACC ATCCTCCAGG CCCACTGCAA CATCTCTAGA 120 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTCGTTCCAC TTCGCTCTAG AGATGTTCCA 30 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAACATCTC TAGAGCCAAG TGGAACGAC 29 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 131 base pairs 

(B) TYPE : nucleic acid 

{ C ) STRANDEDNESS : single 
(D) TOPOLOGY: linear 
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<«D SEQUENCE DESCRIPTION: SEQ ID NO • 1 7 • 

ACTCCGGCGG C GATCCTCATC CACACCTTCA 120 

(2, I NFORMAT I ON FOR SEQ » NO :18, 131 

(i) SEQUENCE CHARACTERISTICS: 
(A, LENGTH, 29 b.., Sr. 
<■> TYPE: nucl.ic .eld 
C) STRANDEDNESS : .ingl. 
(D) TOPOLOGY s lin.ar 


(xi) SEQUENCE 0ESCR IP TION: SEQ ID NO: 18: 

CCAGTACAAr: ii<m». 

»wuv.>_i,t; CGCAGTTCA 

(2) INFORMATION FOR SEQ I D NO:19: " 

(1> SE ?^ NCE CHARACTERISTICS: 

(A) LENCTH: 29 b*s« m*" 

(B) TYPE: nuel«ie IcS 

(C) STRANDEDNESS: .inal. 

(D) TOPOLOGY : l in .!r * 


(xi, SEQUENCE DESCRIPTION : SEQ ID NO:19, 
TCAACTCCCC CCCCCAATTC TTCTACTGC 
(2) INFORMATION FOR SEQ ID NO:20: 

(i, SEQUENCE CHARACTERISTICS- 

A, LENGTH : 195 b.i"2i r . 
B TYPE: nucl.ic «cid 
C) STRANDEDNESS : .inal. 
(D) TOPOLOGY: lin.«r 


29 


(Xi) SEQUENCE DESCRIPTION : SEQ ID NO:20- 

ZZZZ TCTACTCCAA CACCAGCCCC ctcttcaac * — c 60 

ACCTGGAACA ACACCACCGG CAGCAACAAC AATATTACCC TCCAGTGCAA GATCAAGCAC " 
= CA „ GGA GGTGCGCAAG GCCATGTACG CCCCCCCCAT CGAG^ £ 

<7\ TNFOPMJV--ov ~ r - 
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(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS i single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GCAGACCGGT GATGTTGCTG CTGCACCGGA TCTGGCCCTC 40 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGAGGGCCAG ATCCGGTGCA GCAGCAACAT CACCGGTCTG 40 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AACATCACCG GTCTGCTGCT GCTGCTCACC CGGACGGCGG CAAGGACACC GACACCAACG 60 

ACACCGAAAT CTTCCGCGAC GGCGGCAAGG ACACCAACGA CACCGAAATC TTCCCCCCCG 120 

GCGGCGGCGA CATGCGCGAC AACTGGAGAT CTGAGCTGTA CAAGTACAAG GTGGTGACGA 180 

TCGAGCCCCT GCGCGTGGCC CCCACCAAGG CCAAGCGCGC GGTGGTGCAG CGCGACAAGC 240 


(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<xi> SEQUENCE DESCRIPTION: SEQ « N0 . 24 . 
CCCCCCCCCC COCTTTACCO CTTCTCGCCC TCCACCAC 

(2) INFORMATION FOR SEQ ID NO:25, 38 
(i) SEQUENCE CHARACTERISTICS : 

(B) TYPE, nuclaic acid 

(C) STRAND EDNESS : ainala 

(D) TOPOLOCy, linear 


<xi) SEQUENCE DESCRIPTION : SEQ ID NO,25, 
CCCCCCCCAT CCAACCTTAC CATCATTCCA GTAATAACT 

(2) INFORMATION FOR SEQ ZD NO: 26: 39 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 165 baa* I 

(B) TYPE: nuclaic .cid 

(C) STRANDEDNESS, aingl. 
(0) topology, linaar 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO.26- 

— « _ r:rr rr™ *— - ~ 

(2) INFORMATION FOR SEQ ID NO,27, 165 

(i) SEQUENCE CHARACTERISTICS: 
A) LENGTH : 36 b... p.^. 

(B) TYPE: nucl.ic acid 

(C) STRANDEDNESS: aingla 
(D, TOPOLOGY: linaar 

<xi, SEQUENCE DESCRIPTION: SEQ ID NO: 27 . 
CGCGGCCAAT TCACCCCTTA ATGAAAATTC ATGTTG 

(2) INFORMATION FOR SEQ ID NO, 28: 36 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH, 30 baa* paira 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: ainal. 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CGCGGATCCA CCCCTGAAAA AAAAAAACAT 30 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 149 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CGTGAAAAAA AAAAACATGT ATTAAGTGGA ACATTACGAG TACCAGAACA TACATATAGA 60 
AGTAGAGTAA TTTGTTTAGT GATAGATTCA TAAAAGTATT AACATTAGCA AATTTTACAA 120 
CAAAAGATGA AGGAGATTAT ATGTGTGAG 149 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

< C ) STRANDEDNESS : single 
(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CCCGAATTCG ACCTCACACA TATAATCTCC 30 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CGCGGATCCG AGCTCAGAGT AAGTGGACAA 30 
(2) INFORMATION FOR SEQ ID NOi32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
CTCAC AC T AA CTCCACAAAA TCCAACAACT AC T AAT AAAA CAATAAATCT AATAACACAT 
AAATTACTAA AATCTGAGCA ATAACTTTAT TAG T ACAAAA TACAAC TTCG TTATTATTAT 
TATTATTAAC TTTAACTTTT TTACAACCAA CACATTTTAT AAGTTTATCA 
(2) INFORMATION FOR SEQ ID NO: 33 i 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH: 36 baa* pairs 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: aingla 

(D) TOPOLOGY: linaar 


60 
120 
170 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CGCGAATTCC CCCCCCCTTC ATAAACTTAT AAAATC 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 1632 baaa pairs 

(B) TYPE: nuclaic acid 

(C) STRANDEDNESS: aingla 

(D) TOPOLOGY: linaar 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
CTCCACATCC ATTCTCCTCT AAACCACATA CCCCGCCACA CACCCTCACC TGCGCTCCCC 
ACCTCCCCAC CCTGAGGCAA CACAACCCCA CAAACCATGC CCATGGGCTC TCTGCAACCG 
CTGCCCACCT TCTACCTCCT GGGGATGCTG CTCGCTTCCC TGCTAGCCAC CCACAAGCTG 
TGGCTCACCG TGTACTACGG CGTGCCCGTG TGCAAOCAGG CCACCACCAC CCTCTTCTGC 
CCCACCGACG CCAAGGCGTA CGACACCGAC CTCCACAACC TGTGGGCCAC CCAGGCGTCC 
GTGCCCACCG ACCCCAACCC CCAGGAGCTG GACCTCGTGA ACCTGACCGA GAACTTCAAC 
ATGTGGAAGA ACAACATGGT GGAGCAGATG CATGACGACA TCATCAGCCT GTCGGACCAC 
AGCCTGAAGC CCTGCGTGAA GCTGACCCCC CTCTGCGTGA CCCTGAACTG CACCGACCTG 
AGGAACACCA CCAACACCAA CAACAGCACC GCCAACAACA ACAGCAACAG CGAGGGCACC 
ATCAAGGGCG GCGAGATGAA CAACTCCACC TTCAACATCA CCACCAGCAT CCCCGACAAC 
ATCCACAAGG AGTACGCCCT GCTGTACAAG CTGCATATCG TGACCATCCA CAACGACACC 
ACCAGCTACC GCCTGATCTC CTGCAACACC AGCCTGATCA CCCAGGCCTG CCCCAAGATP 


60 
120 
180 
240 
300 
360 
420 

480 

540 

600 

660 
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CACGGCATCC 

GCCCGCTGCT 

GAGCACCCAG 

CTCCTGCTGA 

ACGGCAGCCT 

GGCCGAGGAG 

900 

CAGGTGGTGA 

TCCGCAGCGA 

GAACTTCACC 

GACAACGCCA 

AGACCATCAT 

CGTGCACCTG 

960 

AATGACAGCG 

TGCAGATCAA 

CTGCACGCGT 

CCCAACTACA 

ACAAGCGCAA 

GCGCATCCAC 

1020 

ATCCGCCCCC 

CCCGCGCCTT 

CTACACCACC 

AAGAACATCA 

TCCGCACCAT 

CCCCCAGGCC 

1080 

CACTGCAACA 

TCTCTAGAGC 

CAAGTGGAAC 

GACACCCTGC 

CCCACATCGT 

GAGCAAGCTG 

1140 

AAGGAGCAGT 

TCAAGAACAA 

GACCATCGTG 

TTCAACCAGA 

GCAGCGGCGG 

CGACCCCGAG 

1200 

ATCGTGATGC 

ACAGCTTCAA 

CTGCGGCGGC 

GAATTCTTCT 

ACTCCAACAC 

CAGCCCCCTG 

1260 

TTCAACAGCA 

CCTGGAACGG 

CAACAACACC 

TGGAACAACA 

CCACCGGCAG 

CAACAACAAT 

1320 

ATTACCCTCC 

AGTGCAAGAT 

CAAGCAGATC 

ATCAACATGT 

GCCAGGAGGT 

GGGCAAGGCC 

1380 

ATGTACGCCC 

CCCCCATCGA 

GGGCCAGATC 

CGGTGCAGCA 

GCAACATCAC 

CGGTCTGCTG 

1440 

CTGACCCGCG 

ACGGCGGCAA 

GGACACCGAC 

ACCAACGACA 

CCGAAATCTT 

CCGCCCCGGC 

1500 

GGCGGCGACA 

TGCGCGACAA 

CTGGAGATCT 

GAGCTGTACA 

AGTACAAGGT 

GCTGACGATC 

1560 

GAGCCCCTGG 

GCGTGGCCCC 

CACCAAGGCC 

AAGCGCCGCG 

TGGTGCACCG 

CGAGAAGCGC 

1620 

TAAAGCGGCC 

GC 





1632 


(2) INFORMATION TOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS x 

(A) LENGTH: 2481 bait pair* 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 


ACCGAGAAGC 

TGTGGGTGAC 

CGTCTACTAC 

GGCGTGCCCG 

TGTGGAAGGA 

GGCCACCACC 

60 

ACCCTGTTCT 

GCGCCAGCGA 

CGCCAAGGCG 

TACGACACCG 

AGCTGCACAA 

CGTGTGGGCC 

120 

ACCCAGGCGT 

CCGTGCCCAC 

CGACCCCAAC 

CCCCAGGAGG 

TGGACCTCCT 

GAACGTCACC 

180 

CAGAACTTCA 

ACATCTGGAA 

GAACAACATG 

CTGG AG C AG A 

TGCATGAGGA 

CATCATCAGC 

240 

CTGTGGCACC 

AGAGCCTGAA 

CCCCTGCGTC 

AAGCTGACCC 

CCCTGTGCCT 

GACCCTGAAC 

300 

TGCACCGACC 

TGAGGAACAC 

CACCAACACC 

AACAACAGCA 

CCGCCAACAA 

CAACAGCAAC 

360 

AGCGAGGGCA 

CCATCAAGCG 

CGGCGAGATG 

AAGAACTGCA 

CCTTCAACAT 

CACCACCACC 

420 

ATCCGCGACA 

AGATGCAGAA 

GGAGTACGCC 

CTGCTGTACA 

AG CTGG AT AT 

CGTGACCATC 

480 

CACAACGACA 

GCACCAGCTA 

CCGCCTGATC 

TCCTGCAACA 

CCAGCGTGAT 

CACCCAGGCC 

540 

TCCCCCAAGA 

TCAGCTTCGA 

GCCCATCCCC 

ATCCACTACT 

GCGCCCCCGC 

CGCCTTCCCC 

600 

ATCCTGAACT 

GCAACGACAA 

GAAGTTCAGC 

GGCAAGGGCA 

GCTGCAAGAA 

CGTGACCACC 

660 
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CTCCACTCCA 
CTGCCCCAGC 
ATCCTCCACC 
AAGCGCATCC 
ATCCGCCAGC 
CTGAGCAACC 
CCCCACCCCC 
ACCAGCCCCC 
ACCAACAACA 
CTGCGCAAGG 
ACCCCTCTCC 
TTCCGCCCCC 
CTGCTGACGA 
CCCGACAACC 
ACCATGCGGG 
CTCCAGCACC 
ACCCTGTCCG 
CACCAGCACC 
CCCTGGAACG 
ATGCAGTGCG 
AGCCAGACCC 
CTGTCGAACT 
CTGGGCGCCC 
CGCCAGGGCT 
CGCCCCGAGG 
GTGCACGGCT 
CACCACCGCG 
TGGCAGGTCC 
ACCGCCCTGA 
ATCCAGCTCC 
GCCCTCGAGA 


GTGAGCACCC 

AGCTCCTCCT 

CAACGGCAGC 

720 

GAGAACTTCA 

CCGACAACGC 

CAACACCATC 

780 

AACTGCACGC 

GTCCCAACTA 

CAACAAGCCC 

840 

TTCTACACCA 

CCAAGAACAT 

CATCCGCACC 

900 

CCCAAGTGGA 

ACGACACCCT 

CCCCCACATC 

960 

AAGACCATCG 

TCTTCAACCA 

GAGCACCGCC 

1020 

AACTCCCCCC 

CCGAATTCTT 

CTACTCCAAC 

1080 


™ vwi vuaauaa CACCACCGCC 
ATATTACCCT CCACTCCAAC ATCAAGCAGA TCATCAACAT CTGGCAGCAG 
CCATCTACCC CCCCCCCATC CACCCCCAGA TCCCCTCCAG CACCAACATC 
TCCTCACCCC CCACCCCCGC AACCACACCG ACACCAACCA CACCCAAATC 

AACTGGAGAT CTGACCTGTA CAAGTACAAG 
TCGAGCCCCT GGGCGTCCCC CCCACCAAGC CCAAGCGCCG CGTGGTGCAG 
COGCCGCCAT CGGCCCCCTG TTCCTCCGCT TCCTGGCCCC GGCCGGCAGC 
CCGCCAGCCT GACCCTGACC GTCCAGGCCC CCCTCCTCCT GACCGGCATC 
AGAACAACCT CCTCCCCGCC ATCCAGGCCC ACCAGCATAT GCTCCAGCTC 
GCATCAAGCA GCTCCAGCCC CGCCTGCTGC CCGTCGACCC CTACCTGAAC 
TCCTGGCCTT CTCCGGCTCC TCCGGCAAGC TGATCTCCAC CACCACGGTA 
CCTCCTCCAG CAACAACACC CTGCACGACA TCTCCAACAA CATCACCTCC 
AGCGCGAGAT CGATAACTAC ACCAGCCWA TCTACAGCCT GCTGGAGAAG 
AGCAGGAGAA GAACGAGCAG GAGCTGCTGG AGCTCGACAA CTCGCCGAGC 
GG TTCGACAT CACCAACTCG CTGTGCTACA TCAAAATCTT CATCATGATT 
TGCTGGGCCT CCGCATCGTG TTCGCCGTGC TCAGCATCGT GAACCGCGTG 
ACAGCCCCCT CACCCTCCAG ACCCGGCCCC CCGTGCCGCG CGCGCCCGAC 
GCATCGAGGA GGACCGCCGC GAGCGCCACC CCGACACCAG CGCCAGGCTC 
TCCTGGCCAT CATCTGGGTC GACCTCCGCA CCCTCTTCCT GTTCAGCTAC 
ACCTGCTCCT GATCGCCGCC CGCATCGTGG AACTCCTAGC CCCCCCCGGC 
TGAACTACTG CTGGAACCTC CTCCAGTATT CGAGCCACCA GCTGAAGTCC 
GCCTGCTGAA CCCCACCCCC ATCGCCGTGG CCCAGGGCAC CGACCGCGTG 
TCCACAGGGC CGGGAGGGCG ATCCTCCACA TCCCCACCCG CATCCGCCAG 


cccccrrcrr 


1140 

1200 

1260 

1320 

1360 

1440 

1S00 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 
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(2) INFORMATION FOR SEQ ID NO:36i 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 486 basa pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 


ATGAATCCAG 

TAATAAGTAT 

AACATTATTA 

TTAAGTGTAT 

TACAAATGAG 

TAGAGGACAA 

60 

AGAGTAATAA 

GTTTAACAGC 

ATGTTTAGTA 

AATCAAAATT 

TG AG ATT AG A 

TTGTACACAT 

120 

GAAAATAATA 

CACCTTTGCC 

AATACAACAT 

GAATTTTCAT 

TAACGCGTGA 

AAAAAAAAAA 

180 

CATGTATTAA 

GTGGAACATT 

AGGAGTACCA 

GAACATACAT 

ATAGAAGTAG 

AGTAAATTTG 

240 

TTTAGTGATA 

GATTCATAAA 

AGTATTAACA 

TTAGCAAATT 

TTACAACAAA 

AGATGAAGGA 

300 

GATTATATGT 

GTGAGCTCAG 

AGTAAG TGG A 

CAAAATCCAA 

CAACTAGTAA 

TAAAACAATA 

360 

AATGTAATAA 

GAGATAAATT 

AGTAAAATGT 

GGAGGAATAA 

GTTTATTAGT 

ACAAAATACA 

420 

AGTTGGTTAT 

TATTATTATT 

ATTAAGTTTA 

AGTTTTTTAC 

AAGCAACAGA 

TTTTATAAGT 

480 

TTATGA 






466 


(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 485 baa* pairs 

(B) TYPE: nuclaic acid 

(C) STRAND ED NESS : Singla 

(D) TOPOLOGY: linaar 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

ATGAACCCAG TCATCAGCAT CACTCTCCTG CTTTCAGTCT TGCAGATGTC CCGAGCACAG 60 

AGGGTGATCA GCCTGACAGC CTGCCTGGTG AACAGAACCT TCGACTGGAC TCCCGTCATG 120 

AGAATAACAC CAACTTGCCC ATCCAGCATG AGTTCAGCCT GACCCGAGAG AACAAGAACC 180 

ACGTCCTCTC AGGCACCCTG GGGCTTCCCG AGCACACTTA CCCCTCCCCC GTCAACCTTT 240 

TCAGTGACCG CTTTATCAAC GTCCTTACTC TAGCCAACTT GACCACCAAG GATGAGGGCG 300 

ACTACATGTG TGAACTTCGA GTCTCCGGCC AGAATCCCAC AACCTCCAAT AAAACTATCA 360 

ATCTGATCAG AGACAAGCTG GTCAAGTGTG GTGGCATAAG CCTGCTGCTT CAAAACACTT 420 

CCTGGCTGCT GCTGCTCCTG CTTTCCCTCT CCTTCCTCCA AGCCACGGAC TTCATTTCTC 480 

TGTGA 485 
What is claimad is: 
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10 


15 


20 


25 


expressed , ,ynth " 1C 9 * ne ""ling a prot.in normally 

preferred or : aBMUin " 11S Khe " in " non- 

Preferred codon in th. natural g.„. 

preferred codon encoding th. sa». anino , cid \ 

,vn th « 2 ' Synth " ic » ene ° f 1 "herein said 

synthetic 9 ene i, capable of ..passing said a ^ lim 

protein at a level which is at least not of that 
cell culture syst.a under identical conditions. 


svnth.t/1' . eynt,> «« 9«- <" clai* i wh . rein said 

» " CSPabl# °* '^'"^ —alien 

protein at a level „ h ich is at least 150 % of that 

pressed by said natural gene in an in ^ c . u 

culture system under identical conditions. 

!>BH , * ■ Th * 9-n. of clai- i „ h . r . ln said 

synthetic gene is can»hi a 

n ^ • capable of expressing said mammalian 

protein at a level which is at least 200% of that 
expressed by said natural gene in an ^ ^ ^ 

culture system under identical conditions. 

svn*H , *' ^ SynthCtiC 9ene of c ^i» 1 wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least 500% of that 
expressed by said natural gene in an ^ cell 
culture system under identical conditions. 

6. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least ten times that 
expressed by said nature 
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7. The synthetic gene of claim 1 wherein at least 
10% of the codons in said natural gene are non-preferred 
codons • 


8. The synthetic gene of claim 1 wherein at least 
5 50% of the codons in said natural gene are non-preferred 
codons • 


9. The synthetic gene of claim 1 wherein at least 
50% of the non-preferred codons and less preferred codons 
present in said natural gene have been replaced by 

10 preferred codons. 

10. The synthetic gene of claim 1 wherein at 
least 90% of the non-preferred codons and less preferred 
codons present in said natural gene have been replaced by 
preferred codons • 

15 11. The synthetic gene of claim 1 wherein said 

protein is a retroviral or lentiviral protein. 

12. The synthetic gene of claim 11 wherein said 
protein is an HIV protein. 

13. The synthetic gene of claim 12 wherein said 
20 protein is selected from the group consisting of gag, 

pol, and env. 

14. The synthetic gene of claim 13 wherein said 
protein is gpl20 or gpl60. 

15. The synthetic gene of claim 1 wherein said 
2 5 protein is a human protein. 
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16. A method for preparing a synthetic gene 
encoding a protein normally expressed by mammalian cells, 
comprising identifying non-preferred and less-preferred 
codons in the natural gene encoding said protein and 
replacing one or more of said non-preferred and less- 
preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 
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5ynopl ZOrnn 

: C7C3ACATCC AT7G7GC7C7 AAAGGAGA7A CCCGGCCAGA CACCCTCACC 
£1 73CGG7GCCC A3CTGCCCAG CC7GAGGCAA GAGAACGCCA GAAACCA7GC 

::: ccatggggtc 7-gcaacc3 ctggccacct tgtacgtsct ggggatccts 

ill G7C3C77CC3 TICTAGCCAC C3AGAAGC73 7GGG7GACCG 737AC7ACGG 
201 C37GCCCG73 73GAAGGAGG CCACCACCAC CCTG77C73C 3CCAGC3ACG 
251 CCAACGC5TA C3ACACC3AG G73CACAAC3 7373GGCCAC CCACGCG7CC 
2:i 37GCCGACCG ACCCCAACCC 3GAGGAGG7G GAGC7CS7GA ACG7GACC3A 
151 3AACTTGAAC ATC7GGAAGA ACAACATCG7 3GAGCAGA73 CATGAGGACA 
4 01 7CA7CA3CC7 07CGGAC3A3 AGCC73AAGC C773CG7GAA GCTGACCCCC 
< = 1 C737GC37GA C r~3AAC?3 CACC3ACC73 AGGAACACCA CCAACACCAA 
SCI CAACAGGACC <*=CAACAACA ACAGCAACAG C3AGGGCACC A7CAAGGGC3 
551 3CGAGATGAA CAAC7GCA3C 77CAACA7CA C3ACCAGCA7 CC3CGACAAG 
601 A7CCAGAAGG. A3TAC3C=rT GC7G7ACAAG C7GGATATC3 73AGCATCGA 
="1 CAACGACAGC AGCAGC7ACC 3CC7CA7C7C C7GCAACACC AGCCTGATCA 
"CI C3CAGCCC73 CCCCAAGA7C AGCTTCGAGC CCA7CCCCAT CCACTACTGC 
= ~====«3 ^C77C3C=A7 CC7GAAG73C AACGACAAGA ACTTGACCCC 
301 CAAGGCCAGC TGCAAGAACG 7GAGCACCG7 GCAG7GCACC CACGGCA7CC 
Sri 33CCGG73G7 G^GCACCCAG C7~7=CTGA ACGGCAGCC7 3GCCGAGGAG 
SCI 3AGGTGG7GA rCCGCAGCGA GAAC77CAC3 GACAAC3CCA AGACCA7CA7 
351 C37CCACG7G AATGAGAGCG 73CAGA7CAA C73CACGCC7 CGCAACTACA 
13C1 ACAAGCGCAA G,CGCA7CCAC A7CGGCCCC3 GGCGGGCCTT C7ACACCACC 
1C51 AAGAACATCA TCGGCACCAT CC3CCAGGCC CACTGCAACA 7C7C7AGAGC 
1101 CAAG7GGAAC CACACCC73C GCCAGA7C37 GAGCAAGC7G AAGGAGCAG7 
11 SI 7CAAGAACAA CACCA7CG7G 77CAACCAGA GCAGCGGCGG CGACCCCGAG 
12C1 A7CG73ATGC ACAGC7TCAA C7GCGCCGGC GAA77C77C7 ACT3CAACAC 
12 SI CAGCCG3G73 TTCAACAGCA CG7GGAACGG CAACAACACC 7GGAACAACA 
13 CI CCACC3GCAC CAACAACAA7 A7TACCG7CC AG73CAAGA7 CAAGCAGA7C 
13 SI A7CAACA7G7 GGCAGGAGG7 CGGCAAGGCC A7C7ACGCCC CCCGCATCGA 
KOI GGGCCAGATC CGG7GCAGCA CCAACATCAC CGG7C7GC7G C7CACCCGCG ^1 § I 

14S1 ACGGCCCCAA CGACACCCAC ACCAACGACA CCGAAA7C7T CCGCCCCGGC ( 5H££T ' OF 4 ) 
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: = : 


:zrz-Acrxc zo~z—,r~zz ?— sgaacga 
z a:~ rrrrr zzzzzxzczx ~"caaggc3 tacgacaccs 

AJCT3CACAA C^TCTCCCCr ACrrACC^rVT -CSTCCrCAC C3AC CAA" 

-5. -ir^scAcc r3=Accrr=T =aac=t^c= cagaacttca aca757gcaa 
gaa=aaca7= c7=ga=caga -77A7gagga catcatcagc rr-TsccAcc 
:s: agaccctgaa <,:;-r.r.zrz aagc7=accz r-rrcT=ccT caccztcaac 
:=cacc=a" t-uggaacac :a=caacacc aacaacacca ==sc=.v^aa 

:AACA"AAC A3C3AGGGCA "A7=AAS=G CCCCCACATC AA.AAC73CA 
itl ^TTTCAACA? CkCCA-ACr A?tr-C3ACA AGA7CCA3/.\ 53AUVACSCC 
i!-. -T5=T3TACA AGC733A7A7 7G73AGCA7C CACAACG.'.CA -CACCASCTA 

5:: ::-cct;a?c rrrr-;AArA —agcgtsa 

5:1 7:a=C77C3A ^CCCATCr" .\7=::AC7AC? CCCG: 

= AT=~-AAC7 SCAACr;.:.'.-. JAAGrTCACC GGCAAGGGCA 377GCAAGAA 

--7CACCAC7 7C»AG.GiA C73AC3GCA7 C7GGCG5C75 57GAC7ACr:r 
A;C77r77C7 5AAi.„GCA=C ZrZZZZZASZ AGCACCTG--? ~3.rrrr:zX3C 
=.-.CAACrr.A c==A=aac=C ZAAGACCA77 A7C37GCAC.7 7GAA73AGA0 


-,13<_AGA7C AAC.-CACGC 37C7CAAC7A CAArAAGCGC AAGCGCATCC 
* " ~:~~"ZZZZ CZZZZZZZZZ 77C7ACAC.-.A C7AAGAACA7 7A7C3GCACC 
~~"*==~~ C77AC7SCAA CA777 77AGA 3CGAAG73GA ACGACACCC7 
Til =C=CCAGA7r S73AGCAA.-- "AAGGAGCA 3777AAGAAC AACACZA7C: 

:::: G*c-r\zzzzz zzzzazzzzz agatcg7"at gcacaccttc 

..I. AAC7---G l^GAA77 777 77AC75CAAC AGCACC3C" 7G—AACAG 

"AG— GGAAC (fZ CAAC AACA C773GAACAA CAC=.\C^SC AGCAACAACA 

A7ATTAC737 CCA373CAAG A7=AASC.\SA .-.-ATCAACAT 37G3CAGGAG 

-2 31 37GOGCAAGG CGA737ACG7 "="7 = A7_ .ACCGCCAGA T77GG7GCAG 

:2s; :agcaaca7c acc==7~^- t : ~=acc:= zztzzzzzzz aa R -.ac:acc3 
:::: a=ac=.v..-ja cacc=aaa7c 77::=crrr: =cgcc--.-.-.a cat - c=c=ac 

- - : - GGAGA7 C73AGC757A C AAG7ACAA" "73G7GAG3A 7C3AGC7777 

C—ACGAACG -rii"-C7= 337GG7G7AG cgcgagaacc 
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14 31 GGGCC3GCA7 CJGCGGGG7G 77CC7GCCC7 TCrTGoCCCC GCCCCCCACC 
--01 ACCA7GGCCG C*GCCACCG7 GACGC7GAGC G7GCACCCCC GCCTGCTCG7 

15 51 GAGCGGCA7G ^TGCAGCAGC AGAACAACG7 ZZ'ZZZZZZZ A7CGAGGC" 
1 = 21 AGCAGCATAT Q rTCCAGCTC ACCG7G7GGG GCA7GAAGGA GGTCIAGCCC 

- - = - C3CGTCC7GG C1G73GAGCC "ACG7GAAG 3ACCAGCAGG 7GC7GGGG77 
1"!1 "GGGGCTGC 71C3GGAAGC 73A777GCAC CAGCAGGG7A CCC7GGAAGG 
1"51 C77CC7GGAG CAAGAAGAGC C7GGAGGACA 777GGAACAA CA7GACC7GG 
Is CI A7GCAG7GGG AjCGCGAGA7 C3A7AAG7AC ACCAGCC7GA 777ACAGCG7 
15 51 GC7GGAGAAG A3CCAGAC" AGCAGGAGAA GAACGAGCAG GAGC7GC7GG 
15 Zl AGC7GGACAA CTGGGCGAGC G7G7GGAAC7 CG77CGACA7 CACCAAC7GG 

- - - — . — ."\^*v\ r\ . — - . — ^ * w A 4 a . w . <«rv^^»w o w w . . ^jyjy, w B 

-*■"* — ^ n fl i"" — — — — — — — — — _ 

- - - - — » «J * w— . J i-_jw — -».-„ . -»/%W\./\ . „ u • .AA^w JwU.^ - w w*. . 

- S - ACAGGGCCC7 GAGCC7CCAG ACCCGGCCCC CGG7GGGGCG CGGGCCCGAC 
11-1 23CCGCGAGG ^ZA7CGAGGA ZZAZZZZZZZ GAGCGCGACC GCGACACCAG 
1 - - - CGGCAGGC7C 57GCACGGC7 7CC7GGCGA7 GA7C7GGG7C 3ACC7CCGCA 

*.2 5. CGCA7CG7GC AAC7C77AGG CGGCCGCGGC 7GGGAGG7GC TGAAG7AC7G 

- 3 w 1 « . jwAAC-.o u . CGAG * A . 7 GGAGGCAGGA GC7GAAG7CG AGGGCCG7GA 
^ - - » ~j <— » - v*/\A G^w-^Mu — » » wwvjtCu _ _ ^nwCuwAC CGAGCCGG7G 

- *» - - -v . L wA^'u * » ..AuAowCC C3GGAGGGCG A77773CACA 77GGCACCZG 
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FIGURE 5 
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